Generative Adversarial Networks

What

GANs frame generative modeling as a two-player game between a generator G (creates fake data) and a discriminator D (distinguishes real from fake). The generator learns to produce outputs so realistic that D can’t distinguish them from real data.

Generator G(z): random noise z → fake data G(z)
Discriminator D(x): real or fake? → probability D(x) is real

D is trained to maximize:    log(D(x_real)) + log(1 - D(G(z)))
G is trained to minimize:     log(1 - D(G(z)))

At equilibrium, G produces perfect fakes, and D outputs 0.5 for everything (can’t tell real from fake).

Training Dynamics

The minimax game has a unique equilibrium when:

D is optimal for the current G
G minimizes the Jensen-Shannon divergence between p_data and p_model

In practice, balancing G and D is tricky:

If D too weak: G finds degenerate solutions (mode collapse)
If D too strong: G gradient vanishes (saturates, stops learning)
If G too weak: D easily distinguishes real from fake, G gets no gradient

Practical training tips

Use spectral normalization on D (controls Lipschitz constant)
Alternate: 1 D step per G step (D needs to stay close to optimal)
Use soft labels (0.9 instead of 1.0 for real) to prevent gradient vanishing
Monitor D loss: if it goes to 0 too fast, D is too strong

Mode Collapse

The generator finds a small subset of the data distribution that fools D well, then only produces that subset. D can’t distinguish these fakes, so G has no incentive to diversify.

Solutions:

Unrolled GANs: D’s optimization is simulated for several steps before computing G’s gradient
Wasserstein GAN (WGAN): earth mover distance instead of Jensen-Shannon
Mixed strategies: aggregate multiple G outputs

Architecture: DCGAN

The Deep Convolutional GAN (2016) established stable architecture patterns:

Strided convolutions instead of pooling (learns its own spatial downsampling)
Batch normalization in both G and D
LeakyReLU in D (allows gradient flow from D to early layers)
No FC layers in G (transposed convolutions handle spatial structure)

StyleGAN: Disentangling Style and Content

StyleGAN (2018, 2019) introduced the mapping network and style injection:

z → mapping network (8 FC layers) → w (style code)
w → AdaIN (Adaptive Instance Normalization) → each layer of synthesis network

This separates high-level style (from w) from stochastic variation (from independent noise inputs). Mixing styles at different layers controls coarse vs fine attributes.

StyleGAN2 (2020)

Weight demodulation instead of AdaIN (more stable, better gradient flow)
Path length regularization (encourages smooth interpolation)
No progressive growing (was used in StyleGAN1)

StyleGAN3 (2021)

Alias-free convolution (prevents phase artifacts in animated images)
Boundless generation (eliminates boundaries in tiled generation)

Conditional GANs

Condition both G and D on a class label or other input:

D(x, c): "is this real image of class c?"
G(z, c): "generate fake image of class c"

This enables:

Class-conditional generation (class-specific outputs)
Image-to-image translation (Pix2Pix, CycleGAN)
Text-to-image (CLIP-guided generation)

Image-to-Image Translation

Pix2Pix (2017)

Paired translation: (input image, output image) pairs required.

G: input domain → output domain
D: (input, output) → is this a real pair?

Example: edges → photo, satellite → map, day → night.

CycleGAN (2017)

Unpaired translation: no paired examples needed.

G: X → Y, F: Y → X
D_X: is this from domain X?
D_Y: is this from domain Y?
+ Cycle consistency: F(G(x)) ≈ x, G(F(y)) ≈ y

This enables translation between domains where paired data doesn’t exist.

Evaluation Metrics

Metric	What it measures	Limitation
FID (Fréchet Inception Distance)	Distribution similarity to real images	Needs large sample
IS (Inception Score)	Quality × diversity	Doesn’t compare to real data
Precision/Recall	Quality vs coverage of distribution	Computationally expensive
LPIPS	Perceptual similarity	Needs reference network
Human evaluation	Subjective quality	Expensive, inconsistent

FID is the standard: lower is better. FID < 10 is photorealistic.

GANs vs Diffusion Models

Aspect	GANs	Diffusion Models
Training stability	Unstable, mode collapse	Stable, clear objective
Output quality	High quality when working	Excellent, less mode collapse
Diversity	Can collapse	Full distribution coverage
Inference speed	1 forward pass	20-100 steps (slow)
Mode collapse	Problematic	Rare
Controllability	Style mixing, latent arithmetic	Guidance-based

GANs produce excellent samples but are notoriously hard to train. Diffusion models sacrifice inference speed for training stability and mode coverage.

Modern Status

Largely superseded by diffusion models for image generation (Stable Diffusion, DALL-E 3). Still widely used for:

Real-time applications (game assets, video frames)
Domain-specific translation where paired data exists
Style transfer with perceptual quality requirements
Research on representation learning and disentanglement

Key Papers

Generative Adversarial Nets (Goodfellow et al., 2014, NeurIPS) — the original GAN · arXiv:1406.2661
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks (Radford et al., 2016) — DCGAN · arXiv:1511.06434
Image-to-Image Translation with Conditional Adversarial Networks (Isola et al., 2017, CVPR) — Pix2Pix · arXiv:1611.07004
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (Zhu et al., 2017, ICCV) — CycleGAN · arXiv:1703.10593
A Style-Based Generator Architecture for Generative Adversarial Networks (Karras et al., 2019, CVPR) — StyleGAN · arXiv:1812.04948
Analyzing and Improving the Image Quality of StyleGAN (Karras et al., 2020, CVPR) — StyleGAN2 · arXiv:1912.04958

AI/ML Notes

Explorer

Generative Adversarial Networks

Generative Adversarial Networks

What

Training Dynamics

Practical training tips

Mode Collapse

Architecture: DCGAN

StyleGAN: Disentangling Style and Content

StyleGAN2 (2020)

StyleGAN3 (2021)

Conditional GANs

Image-to-Image Translation

Pix2Pix (2017)

CycleGAN (2017)

Evaluation Metrics

GANs vs Diffusion Models

Modern Status

Key Papers

Links

Graph View

Table of Contents