Denoising Diffusion Probabilistic Models

Ho, Jain, Abbeel (2020)

Read paper

Why It Matters

Iterative denoising through learned reverse diffusion. Foundation of Stable Diffusion, DALL-E 2, Midjourney.

Key Ideas

  1. Define generation as reversing a gradual noising process rather than sampling all structure in one shot.
  2. Train the model to predict and remove noise, turning generation into a sequence of denoising steps.
  3. Trade fast sampling for strong sample quality and stable likelihood-based training.
  4. Establish the core recipe that later diffusion image, audio, and video models scaled up.

Notes

  • The key insight is that hard generation can be decomposed into many easier denoising problems.
  • Later diffusion work mostly improved speed, conditioning, and scale on top of this framework.