Understanding Denoising Diffusion Probabilistic Models: Fundamentals and Process
This article explains the fundamentals of denoising diffusion probabilistic models, detailing the forward Gaussian noise injection, the reverse reconstruction via learned conditional densities, model architecture, loss functions, and experimental results on synthetic datasets, all supported by key research citations.
Introduction
The core idea of diffusion modeling is to learn a process that can reverse information degradation caused by noise, similar to a VAE but modeling a sequence of noise distributions in a Markov chain and decoding data by hierarchical denoising.
Denoising Diffusion Model
The concept originates from diffusion graphs and probabilistic methods such as Markov chains. The original denoising diffusion approach was proposed by Sohl‑Dickstein et al. (2015).
Forward Process
The forward diffusion is formally defined as a Markov chain that adds Gaussian noise to the data over T steps, producing a set of noisy samples. The conditional density at time t depends only on the previous step, and the full joint distribution can be expressed analytically (see the accompanying equations). The variance parameter βₜ can be constant or scheduled (e.g., sigmoid, tanh).
By substituting αₜ = 1 − βₜ, the equations can be reformulated for arbitrary sampling intervals.
Reconstruction (Reverse Process)
The reverse process estimates the conditional density q(χₜ₋₁|χₜ) from the current noisy state, requiring a neural network trained to predict ρ_θ(χₜ₋₁|χₜ) based on learned weights θ and time t.
Ho et al. (2020) propose a fixed‑variance function Σ_θ = βₜ, leading to a concrete sampling step for time t‑1.
Model Construction
The network mirrors a VAE architecture: an input layer matching data dimensionality, multiple hidden linear layers with activation functions, and an output layer of the same size as the input. The final layer splits into two heads that predict the mean and variance of the conditional density.
Loss Function Calculation
The training objective minimizes a loss derived from the KL divergence between two Gaussian distributions and an entropy term, as originally formulated by Sohl‑Dickstein et al. (2015) and later simplified by Ho et al. (2020).
Further simplifications lead to the final loss expression shown below.
Results
The forward diffusion over 100 steps produces noisy samples; ten example trajectories are visualized. The reverse diffusion reconstructs data from isotropic Gaussian noise, with quality depending on hyper‑parameter tuning and training epochs.
Qualitative results on three synthetic datasets (Swiss Roll, Two‑Moons, S‑Curve) demonstrate the model’s ability to recover underlying structures.
References
[1] Sohl‑Dickstein, J., Weiss, E. A., Maheswaranathan, N., & Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. arXiv:1503.03585.
[2] Max Welling & Yee Whye Teh. “Bayesian learning via stochastic gradient Langevin dynamics.” ICML 2011.
[3] Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. arXiv:2006.11239.
[4] Dhariwal, P., & Nichol, A. (2021). Diffusion Models Beat GANs on Image Synthesis. arXiv:2105.05233.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code DAO
We deliver AI algorithm tutorials and the latest news, curated by a team of researchers from Peking University, Shanghai Jiao Tong University, Central South University, and leading AI companies such as Huawei, Kuaishou, and SenseTime. Join us in the AI alchemy—making life better!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
