Artificial Intelligence 13 min read

Exploring Latent Space with TensorFlow Autoencoders (Part 1)

This tutorial walks through building a TensorFlow 2.0 autoencoder from scratch, preparing the FashionDB dataset, visualizing raw images, projecting them into PCA and t‑SNE spaces, constructing encoder and decoder layers, training the model, and visualizing the resulting latent space to reveal image clusters.

Code DAO

Dec 19, 2021

Exploring Latent Space with TensorFlow Autoencoders (Part 1)

This article is the first installment of a series on unsupervised/self‑supervised learning with deep neural networks. It focuses on implementing a latent‑space representation using an autoencoder built with TensorFlow 2.0 and visualizing that space with t‑SNE embeddings.

1. Latent space concept – Real‑world data are high‑dimensional, which hampers computation and feature modeling. By assuming that high‑dimensional data lie on a low‑dimensional manifold, a "latent space" can be defined that captures the intrinsic structure of an image dataset.

2. Autoencoder overview – An autoencoder consists of an encoder, a latent vector, and a decoder. The encoder maps an input image to a low‑dimensional latent vector; the decoder reconstructs the image from that vector. Training minimizes reconstruction loss, and the converged latent vectors embed the images in the latent space.

3. Data preparation

from matplotlib import pyplot as plt
import tensorflow_datasets as tfds
import pathlib

DB_PATH = "your_dir//fashiondb//images"
BUFFER_SIZE = 10000
BATCH_SIZE = 1000
IMG_WIDTH = 60
IMG_HEIGHT = 60

def load(image_file):
    image = tf.io.read_file(image_file)
    image = tf.image.decode_jpeg(image, channels=3)
    return tf.cast(image, tf.float32)

def random_crop(input_image):
    return tf.image.random_crop(input_image, size=[IMG_HEIGHT, IMG_WIDTH, 3])

def resize(input_image):
    return tf.image.resize(input_image, [IMG_HEIGHT, IMG_WIDTH], method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)

def normalize(input_image):
    return input_image / 255.0

The script loads all JPEG files, splits them into training (80 %) and validation (20 %) sets, shuffles, and batches them for model input.

4. Raw data visualization

plt.figure(figsize=(15, 5))
for images, _ in val_ds.take(1):
    for i in range(45):
        ax = plt.subplot(3, 15, i + 1)
        plt.imshow(images[i].numpy().astype("float32"))
        plt.axis("off")

Because no labels are available, the article visualizes the dataset directly and later visualizes clusters after dimensionality reduction.

5. PCA embedding

from sklearn.decomposition import PCA
import numpy as np

pca = PCA(32)
imgs_list = []
for images, _ in val_ds.take(1):
    for i in range(BATCH_SIZE):
        img_arr = tf.keras.preprocessing.image.img_to_array(tf.image.rgb_to_grayscale(images[i]))
        imgs_list.append(img_arr.ravel())
img_mat = np.array(imgs_list)
print("Image Mat Shape:", img_mat.shape)
print("No. of PCA Features:", pca.fit_transform(img_mat).shape)

The resulting 32‑dimensional PCA features are later fed to t‑SNE for 2‑D visualization.

6. t‑SNE visualization function

from PIL import Image
from sklearn.manifold import TSNE
import numpy as np

def visualize_space(X, images, outfile):
    tsne = TSNE(n_components=2, learning_rate='auto', init='random').fit_transform(X)
    tx, ty = tsne[:, 0], tsne[:, 1]
    tx = (tx - np.min(tx)) / (np.max(tx) - np.min(tx))
    ty = (ty - np.min(ty)) / (np.max(ty) - np.min(ty))
    width, height = 4000, 3000
    full_image = Image.new('RGBA', (width, height))
    for img, x, y in zip(images, tx, ty):
        tile = Image.fromarray(np.uint8(np.array(img) * 255))
        full_image.paste(tile, (int((width - img.shape[1]) * x), int((height - img.shape[0]) * y)), mask=tile.convert('RGBA'))
    plt.figure(figsize=(66, 50))
    plt.imshow(full_image)
    plt.axis('off')
    full_image.save(outfile)

visualize_space(np.array(pca_feat), vis_imgs, "tSNE-PCA-fashiondb.png")

The generated image shows distinct clusters; objects with low similarity appear far apart.

7. Encoder architecture

from tensorflow.keras.layers import *
from tensorflow.keras.models import Model
import tensorflow.keras.backend as K

latent_dim = 32
input_img = Input(shape=(60, 60, 3))

x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPool2D((2, 2), padding='same')(x)

x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = MaxPool2D((2, 2), padding='same')(x)

x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
x = MaxPool2D((2, 2), padding='same')(x)

x = Conv2D(1, (3, 3), activation='relu', padding='same')(x)
shape_before_flattening = K.int_shape(x)
print(shape_before_flattening)

x = Flatten()(x)
x = Dense(64, activation='relu')(x)
Z = Dense(latent_dim)(x)
print(K.int_shape(Z))

encoder = Model(input_img, Z)
encoder.summary()

The encoder reduces the 60×60×3 input through three convolution‑pooling stages and a final dense layer to a 32‑dimensional latent vector.

8. Decoder architecture

decoder_input = Input(K.int_shape(Z)[1:])

x = Dense(15*15*4, activation='relu', name="intermediate_decoder")(decoder_input)
x = Dense(900, activation='sigmoid', name="original_decoder")(x)
x = Reshape((15, 15, 4))(x)

x = Conv2DTranspose(3, (3, 3), padding='same')(x)
x = UpSampling2D((2, 2))(x)

x = Conv2DTranspose(3, (3, 3), padding='same')(x)
x = UpSampling2D((2, 2))(x)

decoder = Model(decoder_input, x)
decoder.summary()

The decoder mirrors the encoder, expanding the latent vector back to the original image size.

9. Model assembly and training

ae = Model(encoder.input, decoder(encoder.output))
from keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(monitor='val_loss', patience=10, verbose=5)
history = ae.fit(train_ds, epochs=10, validation_data=val_ds, callbacks=[early_stopping], verbose=1)

Training stops early if validation loss does not improve for ten epochs.

10. Latent‑space visualization

# Project validation set into latent space
vis_imgs = []
for input_images, _ in val_ds.take(1):
    latent_vec = encoder(input_images)
    for i in range(BATCH_SIZE):
        vis_imgs.append(tf.keras.preprocessing.image.img_to_array(input_images[i]))
print("Dimension of Latent Space:", latent_vec.shape)

The latent vectors are fed to the same t‑SNE function, producing a 2‑D plot where similar images cluster together and dissimilar ones are far apart.

Conclusion – The tutorial demonstrates how to load and preprocess an image dataset, build and train a TensorFlow autoencoder, and use PCA and t‑SNE to visualize both the original and learned latent spaces, revealing meaningful image clusters without any label information.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python TensorFlow Latent Space PCA Autoencoder t-SNE

Written by

Code DAO

We deliver AI algorithm tutorials and the latest news, curated by a team of researchers from Peking University, Shanghai Jiao Tong University, Central South University, and leading AI companies such as Huawei, Kuaishou, and SenseTime. Join us in the AI alchemy—making life better!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.