Artificial Intelligence 15 min read

Style‑Adversarial Autoencoder Enables Precise Content‑Style Image Generation

This paper introduces a style‑adversarial autoencoder that separates content and style latent variables, uses a multi‑class discriminator, and demonstrates superior image generation and data augmentation across MNIST, face, and text datasets tasks.

Alibaba Cloud Developer

Aug 2, 2018

Style‑Adversarial Autoencoder Enables Precise Content‑Style Image Generation

Abstract

In this paper we propose a generative adversarial network based on autoencoders, called the Style‑Adversarial Autoencoder (SAAE). Unlike conventional generative autoencoders that impose a prior on the latent space, SAAE splits the latent variable into two components—style features and content features—both encoded from real images. This decomposition allows arbitrary adjustment of generated image content and style by selecting different example images. A multi‑class classifier is used as the discriminator to improve realism, and a three‑step training strategy is introduced to stabilize convergence.

1 Introduction

Generative modeling of natural images is a fundamental problem in computer vision and machine learning. Early work focused on statistical principles, but lacked effective feature representations. Deep neural networks have shown strong representation learning capabilities and, combined with Bayesian inference or adversarial training, have produced a variety of deep generative models. Regularization techniques such as L1, L2, and dropout improve performance, yet most autoencoder‑based methods still impose a prior distribution on the latent space, which limits their ability to model complex, high‑dimensional data like colored characters or faces.

2 Style‑Adversarial Autoencoder

2.1 Generator

The generator consists of two encoders (Enc_c for content and Enc_s for style) and a decoder (Dec). Enc_c encodes a content image into a latent representation z_c, Enc_s encodes a style image into z_s, and Dec decodes the concatenated latent vector into the output image. We denote the whole generator as G = Dec(Enc_c, Enc_s).

2.2 Discriminator

Instead of a binary discriminator, we employ a multi‑class classifier that distinguishes real images from generated ones and also predicts the specific class of each real image. This provides richer feedback for the generator and encourages more realistic synthesis.

2.3 Network Architecture

The SAAE network is built on convolutional neural networks. Both the content and style feature extractors contain three stride‑1 convolutional layers without down‑sampling to preserve detail. The style feature map is reshaped by a fully‑connected layer into a style vector, then expanded back to a feature map of the same spatial size as the content feature map. The concatenated feature maps are decoded by three convolutional layers to produce the target image. The discriminator is a standard CNN with three convolutional layers, a 2×2 max‑pool after the first layer, and two fully‑connected layers, outputting a (k+1)‑dimensional probability vector for k real classes plus the fake class.

2.4 Training Strategy

Inspired by step‑wise training methods, we propose a three‑step training procedure that stabilizes optimization and improves convergence of the adversarial game.

3 Experiments

3.1 Log‑Likelihood Analysis

We evaluate SAAE on the MNIST dataset by computing the log‑likelihood of generated samples. Compared with six state‑of‑the‑art methods, SAAE achieves the highest log‑likelihood, surpassing the Adversarial Autoencoder (AAE) by approximately 0.89.

3.2 Attribute‑Conditioned Face Generation

Using the Labeled Faces in the Wild (LFW) dataset, SAAE can modify specific attributes (e.g., glasses) while preserving overall identity, demonstrating effective attribute transfer.

3.3 Model Samples

We compare SAAE with DCGAN on the IIIT5K‑word and Chinese license‑plate (PLATE) datasets. SAAE samples exhibit clearer edges and more faithful character shapes.

3.4 Data Generation for Supervised Learning

We generate synthetic training data for Chinese license‑plate recognition (DR‑PLATE). Adding more generated samples slows convergence but steadily improves classification accuracy, confirming the utility of SAAE‑generated data for supervised tasks.

4 Conclusion

Future work will focus on refining the network architecture for higher generation quality and extending the framework to other applications such as semi‑supervised feature learning.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

deep learning image synthesis Generative Adversarial Networks Style Transfer autoencoders

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.