Understanding WGANs: From GAN Pitfalls to Wasserstein Solutions
This article explains the shortcomings of traditional GANs, introduces the Wasserstein GAN (WGAN) as a remedy using the Earth‑Mover distance, describes the theoretical motivations, outlines the algorithmic steps and constraints, and provides illustrative diagrams and references for deeper study.
Scene Description
Inspired by the "dimensional reduction" concept from science fiction, the article uses a cartoon analogy to illustrate how low‑dimensional objects can become invisible in higher‑dimensional spaces, setting the stage for understanding why traditional GANs struggle when the data manifold lies in a low‑dimensional subspace.
Problem Description
The reader is asked to identify the problems that limit original GAN training, explain how WGAN improves upon them, detail the WGAN algorithm, and write pseudo‑code.
Answer and Analysis
1. Pitfalls of GANs
Original GANs minimize the Jensen‑Shannon (JS) divergence between the generator distribution and the real data distribution. In practice the training is unstable and suffers from mode collapse because the JS distance becomes constant (≈log 2) when the generator’s distribution lies on a low‑dimensional manifold, yielding zero gradients for the generator.
2. How WGAN Addresses These Issues
WGAN replaces the JS divergence with the Wasserstein (Earth‑Mover) distance, which provides meaningful gradients even when the supports of the two distributions do not overlap. The Wasserstein distance is continuous and almost everywhere differentiable with respect to the generator parameters, enabling stable training. To compute it efficiently, WGAN enforces a 1‑Lipschitz constraint on the critic (formerly the discriminator) using weight clipping or gradient penalty.
3. WGAN Algorithm
The training loop proceeds as follows:
for number of training iterations:
for n_critic steps:
sample real data x ~ P_data
sample noise z ~ P_z
generate fake data γ = G(z)
compute critic loss: L_D = D(x) - D(γ)
update critic parameters with gradient ascent
enforce 1‑Lipschitz (e.g., weight clipping)
sample noise z ~ P_z
generate fake data γ = G(z)
compute generator loss: L_G = -D(γ)
update generator parameters with gradient descentAfter each update the Wasserstein estimate decreases, indicating that the generator distribution moves closer to the real data distribution.
Illustrative Figures
Hulu Beijing
Follow Hulu's official WeChat account for the latest company updates and recruitment information.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.