Exploring Latent Space with a Variational Autoencoder in TensorFlow
This article explains the theory behind variational autoencoders, details their KL‑divergence loss, provides a complete TensorFlow implementation, and demonstrates reconstruction, latent‑space visualization, and novel image generation through sampling and interpolation.
Variational Autoencoders (VAEs) extend ordinary autoencoders by learning a distribution over the latent space instead of fixed vectors, enabling smooth interpolation. The article first explains the VAE principle, describing how a multivariate Gaussian prior is imposed on the latent variables.
The loss consists of a reconstruction term (MSE scaled by 1000) and a KL‑divergence term that measures the difference between the learned Gaussian and the unit Gaussian prior, as shown in the displayed equations.
Implementation steps are provided for TensorFlow ≥ 2.6. The dataset is a Kaggle fashion‑product collection downloaded from a short URL. Data loading functions resize, random‑crop, and normalize images to 128×128×3. The encoder stacks three Conv2D‑BatchNormalization layers with stride 2, flattens the feature map, and outputs mean and log‑variance vectors (latent_dim = 256). A sampling lambda function draws latent vectors using K.random_normal. The decoder mirrors the encoder with Conv2DTranspose layers and a sigmoid output.
Model architecture is illustrated in Figure 2.
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam, RMSprop
import tensorflow.keras.backend as K
latent_dim = 256
input_img = Input(shape=(128, 128, 3))
x = Conv2D(32, (3,3), activation='relu', padding='same', strides=2)(input_img)
x = BatchNormalization()(x)
x = Conv2D(16, (3,3), activation='relu', padding='same', strides=2)(x)
x = BatchNormalization()(x)
x = Conv2D(4, (3,3), activation='relu', padding='same', strides=2)(x)
x = BatchNormalization()(x)
shape_before_flattening = K.int_shape(x)
x = Flatten()(x)
z_mu = Dense(latent_dim)(x)
z_log_sigma = Dense(latent_dim, kernel_initializer='zeros', bias_initializer='zeros')(x)
def sampling(args):
z_mu, z_log_sigma = args
epsilon = K.random_normal(shape=(K.shape(z_mu)[0], latent_dim), mean=0., stddev=1.)
return z_mu + K.exp(z_log_sigma) * epsilon
z = Lambda(sampling)([z_mu, z_log_sigma])
encoder = Model(input_img, z)
decoder_input = Input(K.int_shape(z)[1:])
x = Dense(4096, activation='relu')(decoder_input)
x = Reshape((8,8,64))(x)
x = Conv2DTranspose(32, (3,3), strides=2, padding='same')(x)
x = BatchNormalization()(x)
x = Conv2DTranspose(16, (3,3), strides=2, padding='same')(x)
x = BatchNormalization()(x)
x = Conv2DTranspose(8, (3,3), strides=2, padding='same')(x)
x = BatchNormalization()(x)
x = Conv2DTranspose(3, (3,3), strides=2, padding='same', activation='sigmoid')(x)
decoder = Model(decoder_input, x)
pred = decoder(z)
vae = Model(input_img, pred)
def vae_loss(x, pred):
x = K.flatten(x)
pred = K.flatten(pred)
reconst_loss = 1000 * K.mean(K.square(x - pred))
kl_loss = -0.5 * K.mean(1 + z_log_sigma - K.square(z_mu) - K.exp(z_log_sigma), axis=-1)
return reconst_loss + kl_loss
vae.add_loss(vae_loss(input_img, pred))
optimizer = Adam(learning_rate=0.0005)
vae.compile(optimizer=optimizer, loss=None)Training uses early stopping (patience = 10) for up to 50 epochs. After training, the latent vectors of validation images are projected and visualized with t‑SNE. Reconstruction results are shown in Figure 4.
To generate new images, the encoder projects source and target images into the latent space, averages their vectors, and feeds the result to the decoder, producing hybrid images (Figure 5). Linear interpolation between two latent vectors yields a sequence of intermediate images (Figures 6 and 7). The article notes that because the dataset contains diverse product categories, the VAE learns multiple sub‑distributions rather than a single global one.
In summary, the guide demonstrates how to build a VAE in TensorFlow, explains the KL‑divergence loss, and showcases reconstruction, latent‑space visualization, and novel image synthesis through sampling and interpolation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code DAO
We deliver AI algorithm tutorials and the latest news, curated by a team of researchers from Peking University, Shanghai Jiao Tong University, Central South University, and leading AI companies such as Huawei, Kuaishou, and SenseTime. Join us in the AI alchemy—making life better!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
