Artificial Intelligence 13 min read

Exploring Latent Space with a Variational Autoencoder in TensorFlow

This article explains the theory behind variational autoencoders, details their KL‑divergence loss, provides a complete TensorFlow implementation, and demonstrates reconstruction, latent‑space visualization, and novel image generation through sampling and interpolation.

Code DAO

Dec 20, 2021

Exploring Latent Space with a Variational Autoencoder in TensorFlow

Variational Autoencoders (VAEs) extend ordinary autoencoders by learning a distribution over the latent space instead of fixed vectors, enabling smooth interpolation. The article first explains the VAE principle, describing how a multivariate Gaussian prior is imposed on the latent variables.

The loss consists of a reconstruction term (MSE scaled by 1000) and a KL‑divergence term that measures the difference between the learned Gaussian and the unit Gaussian prior, as shown in the displayed equations.

Implementation steps are provided for TensorFlow ≥ 2.6. The dataset is a Kaggle fashion‑product collection downloaded from a short URL. Data loading functions resize, random‑crop, and normalize images to 128×128×3. The encoder stacks three Conv2D‑BatchNormalization layers with stride 2, flattens the feature map, and outputs mean and log‑variance vectors (latent_dim = 256). A sampling lambda function draws latent vectors using K.random_normal. The decoder mirrors the encoder with Conv2DTranspose layers and a sigmoid output.

Model architecture is illustrated in Figure 2.

from tensorflow.keras.layers import *
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam, RMSprop
import tensorflow.keras.backend as K

latent_dim = 256
input_img = Input(shape=(128, 128, 3))
x = Conv2D(32, (3,3), activation='relu', padding='same', strides=2)(input_img)
x = BatchNormalization()(x)
x = Conv2D(16, (3,3), activation='relu', padding='same', strides=2)(x)
x = BatchNormalization()(x)
x = Conv2D(4, (3,3), activation='relu', padding='same', strides=2)(x)
x = BatchNormalization()(x)
shape_before_flattening = K.int_shape(x)
x = Flatten()(x)
z_mu = Dense(latent_dim)(x)
z_log_sigma = Dense(latent_dim, kernel_initializer='zeros', bias_initializer='zeros')(x)

def sampling(args):
    z_mu, z_log_sigma = args
    epsilon = K.random_normal(shape=(K.shape(z_mu)[0], latent_dim), mean=0., stddev=1.)
    return z_mu + K.exp(z_log_sigma) * epsilon

z = Lambda(sampling)([z_mu, z_log_sigma])
encoder = Model(input_img, z)

decoder_input = Input(K.int_shape(z)[1:])
x = Dense(4096, activation='relu')(decoder_input)
x = Reshape((8,8,64))(x)
x = Conv2DTranspose(32, (3,3), strides=2, padding='same')(x)
x = BatchNormalization()(x)
x = Conv2DTranspose(16, (3,3), strides=2, padding='same')(x)
x = BatchNormalization()(x)
x = Conv2DTranspose(8, (3,3), strides=2, padding='same')(x)
x = BatchNormalization()(x)
x = Conv2DTranspose(3, (3,3), strides=2, padding='same', activation='sigmoid')(x)
decoder = Model(decoder_input, x)

pred = decoder(z)
vae = Model(input_img, pred)

def vae_loss(x, pred):
    x = K.flatten(x)
    pred = K.flatten(pred)
    reconst_loss = 1000 * K.mean(K.square(x - pred))
    kl_loss = -0.5 * K.mean(1 + z_log_sigma - K.square(z_mu) - K.exp(z_log_sigma), axis=-1)
    return reconst_loss + kl_loss

vae.add_loss(vae_loss(input_img, pred))
optimizer = Adam(learning_rate=0.0005)
vae.compile(optimizer=optimizer, loss=None)

Training uses early stopping (patience = 10) for up to 50 epochs. After training, the latent vectors of validation images are projected and visualized with t‑SNE. Reconstruction results are shown in Figure 4.

To generate new images, the encoder projects source and target images into the latent space, averages their vectors, and feeds the result to the decoder, producing hybrid images (Figure 5). Linear interpolation between two latent vectors yields a sequence of intermediate images (Figures 6 and 7). The article notes that because the dataset contains diverse product categories, the VAE learns multiple sub‑distributions rather than a single global one.

In summary, the guide demonstrates how to build a VAE in TensorFlow, explains the KL‑divergence loss, and showcases reconstruction, latent‑space visualization, and novel image synthesis through sampling and interpolation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python TensorFlow Image Generation Latent Space Variational Autoencoder KL divergence

Written by

Code DAO

We deliver AI algorithm tutorials and the latest news, curated by a team of researchers from Peking University, Shanghai Jiao Tong University, Central South University, and leading AI companies such as Huawei, Kuaishou, and SenseTime. Join us in the AI alchemy—making life better!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.