Diffusion Model

A class of generative models that learn to reverse a gradual noising process, generating data by iteratively denoising from pure noise.

A diffusion model is a type of deep generative model that produces samples by learning to reverse a gradual noising process. During training, a forward diffusion process incrementally adds Gaussian noise to data until it becomes pure noise. A neural network then learns the reverse process, removing noise at each timestep to reconstruct coherent data.

The breakthrough came with DDPM (Denoising Diffusion Probabilistic Models) in 2020, which demonstrated image generation quality competitive with GANs. This led to large-scale systems such as Stable Diffusion and DALL-E 2. Compared to GANs, diffusion models offer stable training and avoid mode collapse.

Noise schedule: Controls how much noise is added at each timestep. Linear, cosine, and sigmoid schedules significantly affect generation quality
Conditional generation: Text prompts or class labels guide the generation process. Classifier-Free Guidance (CFG) balances fidelity and diversity
Latent diffusion: Performs the diffusion process in a compressed latent space produced by a VAE encoder, dramatically reducing computational cost. Stable Diffusion popularized this approach

Beyond image synthesis, diffusion models power super-resolution, inpainting, video generation, and 3D asset creation. Active research focuses on faster sampling (DDIM, DPM-Solver, consistency models) and improved controllability (ControlNet, IP-Adapter).

Diffusion Model

Related Terms

Related Articles