Exploring Latent Space with TensorFlow Autoencoders (Part 1)
This tutorial walks through building a TensorFlow 2.0 autoencoder from scratch, preparing the FashionDB dataset, visualizing raw images, projecting them into PCA and t‑SNE spaces, constructing encoder and decoder layers, training the model, and visualizing the resulting latent space to reveal image clusters.
This article is the first installment of a series on unsupervised/self‑supervised learning with deep neural networks. It focuses on implementing a latent‑space representation using an autoencoder built with TensorFlow 2.0 and visualizing that space with t‑SNE embeddings.
1. Latent space concept – Real‑world data are high‑dimensional, which hampers computation and feature modeling. By assuming that high‑dimensional data lie on a low‑dimensional manifold, a "latent space" can be defined that captures the intrinsic structure of an image dataset.
2. Autoencoder overview – An autoencoder consists of an encoder, a latent vector, and a decoder. The encoder maps an input image to a low‑dimensional latent vector; the decoder reconstructs the image from that vector. Training minimizes reconstruction loss, and the converged latent vectors embed the images in the latent space.
3. Data preparation
from matplotlib import pyplot as plt
import tensorflow_datasets as tfds
import pathlib
DB_PATH = "your_dir//fashiondb//images"
BUFFER_SIZE = 10000
BATCH_SIZE = 1000
IMG_WIDTH = 60
IMG_HEIGHT = 60
def load(image_file):
image = tf.io.read_file(image_file)
image = tf.image.decode_jpeg(image, channels=3)
return tf.cast(image, tf.float32)
def random_crop(input_image):
return tf.image.random_crop(input_image, size=[IMG_HEIGHT, IMG_WIDTH, 3])
def resize(input_image):
return tf.image.resize(input_image, [IMG_HEIGHT, IMG_WIDTH], method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
def normalize(input_image):
return input_image / 255.0The script loads all JPEG files, splits them into training (80 %) and validation (20 %) sets, shuffles, and batches them for model input.
4. Raw data visualization
plt.figure(figsize=(15, 5))
for images, _ in val_ds.take(1):
for i in range(45):
ax = plt.subplot(3, 15, i + 1)
plt.imshow(images[i].numpy().astype("float32"))
plt.axis("off")Because no labels are available, the article visualizes the dataset directly and later visualizes clusters after dimensionality reduction.
5. PCA embedding
from sklearn.decomposition import PCA
import numpy as np
pca = PCA(32)
imgs_list = []
for images, _ in val_ds.take(1):
for i in range(BATCH_SIZE):
img_arr = tf.keras.preprocessing.image.img_to_array(tf.image.rgb_to_grayscale(images[i]))
imgs_list.append(img_arr.ravel())
img_mat = np.array(imgs_list)
print("Image Mat Shape:", img_mat.shape)
print("No. of PCA Features:", pca.fit_transform(img_mat).shape)The resulting 32‑dimensional PCA features are later fed to t‑SNE for 2‑D visualization.
6. t‑SNE visualization function
from PIL import Image
from sklearn.manifold import TSNE
import numpy as np
def visualize_space(X, images, outfile):
tsne = TSNE(n_components=2, learning_rate='auto', init='random').fit_transform(X)
tx, ty = tsne[:, 0], tsne[:, 1]
tx = (tx - np.min(tx)) / (np.max(tx) - np.min(tx))
ty = (ty - np.min(ty)) / (np.max(ty) - np.min(ty))
width, height = 4000, 3000
full_image = Image.new('RGBA', (width, height))
for img, x, y in zip(images, tx, ty):
tile = Image.fromarray(np.uint8(np.array(img) * 255))
full_image.paste(tile, (int((width - img.shape[1]) * x), int((height - img.shape[0]) * y)), mask=tile.convert('RGBA'))
plt.figure(figsize=(66, 50))
plt.imshow(full_image)
plt.axis('off')
full_image.save(outfile)
visualize_space(np.array(pca_feat), vis_imgs, "tSNE-PCA-fashiondb.png")The generated image shows distinct clusters; objects with low similarity appear far apart.
7. Encoder architecture
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model
import tensorflow.keras.backend as K
latent_dim = 32
input_img = Input(shape=(60, 60, 3))
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPool2D((2, 2), padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = MaxPool2D((2, 2), padding='same')(x)
x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
x = MaxPool2D((2, 2), padding='same')(x)
x = Conv2D(1, (3, 3), activation='relu', padding='same')(x)
shape_before_flattening = K.int_shape(x)
print(shape_before_flattening)
x = Flatten()(x)
x = Dense(64, activation='relu')(x)
Z = Dense(latent_dim)(x)
print(K.int_shape(Z))
encoder = Model(input_img, Z)
encoder.summary()The encoder reduces the 60×60×3 input through three convolution‑pooling stages and a final dense layer to a 32‑dimensional latent vector.
8. Decoder architecture
decoder_input = Input(K.int_shape(Z)[1:])
x = Dense(15*15*4, activation='relu', name="intermediate_decoder")(decoder_input)
x = Dense(900, activation='sigmoid', name="original_decoder")(x)
x = Reshape((15, 15, 4))(x)
x = Conv2DTranspose(3, (3, 3), padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2DTranspose(3, (3, 3), padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoder = Model(decoder_input, x)
decoder.summary()The decoder mirrors the encoder, expanding the latent vector back to the original image size.
9. Model assembly and training
ae = Model(encoder.input, decoder(encoder.output))
from keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(monitor='val_loss', patience=10, verbose=5)
history = ae.fit(train_ds, epochs=10, validation_data=val_ds, callbacks=[early_stopping], verbose=1)Training stops early if validation loss does not improve for ten epochs.
10. Latent‑space visualization
# Project validation set into latent space
vis_imgs = []
for input_images, _ in val_ds.take(1):
latent_vec = encoder(input_images)
for i in range(BATCH_SIZE):
vis_imgs.append(tf.keras.preprocessing.image.img_to_array(input_images[i]))
print("Dimension of Latent Space:", latent_vec.shape)The latent vectors are fed to the same t‑SNE function, producing a 2‑D plot where similar images cluster together and dissimilar ones are far apart.
Conclusion – The tutorial demonstrates how to load and preprocess an image dataset, build and train a TensorFlow autoencoder, and use PCA and t‑SNE to visualize both the original and learned latent spaces, revealing meaningful image clusters without any label information.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code DAO
We deliver AI algorithm tutorials and the latest news, curated by a team of researchers from Peking University, Shanghai Jiao Tong University, Central South University, and leading AI companies such as Huawei, Kuaishou, and SenseTime. Join us in the AI alchemy—making life better!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
