Understanding GANs: Theory, Minimax Game, and Training Challenges

This article introduces Generative Adversarial Networks (GANs), explains their minimax formulation, value function, Jensen‑Shannon divergence, common variants, and practical training issues such as gradient saturation, while also previewing the next topic on Hidden Markov Models.

Hulu Beijing
Hulu Beijing
Hulu Beijing
Understanding GANs: Theory, Minimax Game, and Training Challenges

Introduction to Generative Adversarial Networks (GANs)

In 2014 Goodfellow and friends conceived GANs in a bar, proposing a new framework for training generative models. GANs quickly spread across deep learning, spawning many variants such as WGAN, InfoGAN, f‑GAN, BiGAN, DCGAN, IRGAN, etc.

GAN illustration
GAN illustration

Conceptual Analogy

The GAN framework can be likened to a Tai‑Chi diagram: the generator (G) creates data (the "yang"), while the discriminator (D) judges authenticity (the "yin"). G samples from a prior distribution, transforms it via a neural network, and produces synthetic data; D receives both real and synthetic samples and tries to distinguish them, forming a competitive pair.

Generator‑Discriminator analogy
Generator‑Discriminator analogy

Problem Statements

Three questions are posed:

Formulate the minimax value function of GANs, give the Nash equilibrium (G*, D*) and the value at equilibrium; then derive the optimal discriminator D G* when G is fixed, and the optimal generator G D* when D is fixed.

Explain how GANs avoid the costly probabilistic inference required by traditional generative models.

Discuss whether the ideal minimization objective is achieved in practice and what training problems arise.

Answers and Analysis

(1) Minimax Game and Value Function

The discriminator aims to assign high probability to real data and low probability to generated data, leading to a binary‑cross‑entropy loss. Assuming equal prior for real and generated samples, the loss can be expressed as:

Loss formulation
Loss formulation

Maximizing the corresponding value function V(G,D) yields the classic minimax objective:

Minimax objective
Minimax objective

Optimizing G minimizes the Jensen‑Shannon divergence between the data distribution p data and the generator distribution p g . At the equilibrium p data =p g , the optimal discriminator outputs ½ for any input and the value function equals –log 4.

Equilibrium
Equilibrium
Value at equilibrium
Value at equilibrium

When D is fixed, the optimal G minimizes the same value function, leading to the same equilibrium distribution.

Optimal G given D
Optimal G given D

(2) Avoiding Probabilistic Inference

Traditional generative models require explicit density functions and costly marginal or conditional probability calculations. GANs bypass this by learning a deterministic mapping f: Z→X using a neural network, where Z is sampled from a simple prior. The Jacobian of f relates the distributions of Z and X, allowing the model to implicitly represent p(X) without evaluating partition functions.

Mapping Z to X
Mapping Z to X

(3) Training Challenges

Early in training the generator produces poor samples that the discriminator easily rejects, causing vanishing gradients (optimization saturation). The discriminator’s sigmoid output D(x)=σ(o(x)) yields near‑zero gradients for G when D is too strong. This hampers G’s learning.

Saturation illustration
Saturation illustration

The derivative of the generator’s loss with respect to its parameters becomes almost zero, indicating that a powerful discriminator provides little useful gradient to improve G. Various techniques (e.g., alternative loss functions, feature matching, label smoothing) are proposed to mitigate this issue.

Gradient of G loss
Gradient of G loss
Zero gradient
Zero gradient

Next Topic Preview: Hidden Markov Models

The upcoming article will discuss Hidden Markov Models (HMMs), a classic generative model for sequence labeling tasks such as Chinese word segmentation, POS tagging, and speech recognition. It will cover how to model Chinese word segmentation with HMMs and how to train the model from a corpus.

HMM preview
HMM preview
machine learningdeep learningGANGenerative Adversarial NetworksMinimax Game
Hulu Beijing
Written by

Hulu Beijing

Follow Hulu's official WeChat account for the latest company updates and recruitment information.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.