Artificial Intelligence 11 min read

Understanding WGANs: From GAN Pitfalls to Wasserstein Solutions

This article explains the shortcomings of traditional GANs, introduces the Wasserstein GAN (WGAN) as a remedy using the Earth‑Mover distance, describes the theoretical motivations, outlines the algorithmic steps and constraints, and provides illustrative diagrams and references for deeper study.

Hulu Beijing

Mar 6, 2018

Understanding WGANs: From GAN Pitfalls to Wasserstein Solutions

Scene Description

Inspired by the "dimensional reduction" concept from science fiction, the article uses a cartoon analogy to illustrate how low‑dimensional objects can become invisible in higher‑dimensional spaces, setting the stage for understanding why traditional GANs struggle when the data manifold lies in a low‑dimensional subspace.

Problem Description

The reader is asked to identify the problems that limit original GAN training, explain how WGAN improves upon them, detail the WGAN algorithm, and write pseudo‑code.

Answer and Analysis

1. Pitfalls of GANs

Original GANs minimize the Jensen‑Shannon (JS) divergence between the generator distribution and the real data distribution. In practice the training is unstable and suffers from mode collapse because the JS distance becomes constant (≈log 2) when the generator’s distribution lies on a low‑dimensional manifold, yielding zero gradients for the generator.

2. How WGAN Addresses These Issues

WGAN replaces the JS divergence with the Wasserstein (Earth‑Mover) distance, which provides meaningful gradients even when the supports of the two distributions do not overlap. The Wasserstein distance is continuous and almost everywhere differentiable with respect to the generator parameters, enabling stable training. To compute it efficiently, WGAN enforces a 1‑Lipschitz constraint on the critic (formerly the discriminator) using weight clipping or gradient penalty.

3. WGAN Algorithm

The training loop proceeds as follows:

for number of training iterations:
    for n_critic steps:
        sample real data x ~ P_data
        sample noise z ~ P_z
        generate fake data γ = G(z)
        compute critic loss: L_D = D(x) - D(γ)
        update critic parameters with gradient ascent
        enforce 1‑Lipschitz (e.g., weight clipping)
    sample noise z ~ P_z
    generate fake data γ = G(z)
    compute generator loss: L_G = -D(γ)
    update generator parameters with gradient descent

After each update the Wasserstein estimate decreases, indicating that the generator distribution moves closer to the real data distribution.

Illustrative Figures

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning Deep Learning Generative Adversarial Networks Wasserstein distance WGAN

Written by

Hulu Beijing

Follow Hulu's official WeChat account for the latest company updates and recruitment information.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.