Diffusion Models

From DDPM to Flow Matching

q(xtxt1)=N(xt;1βtxt1,βtI)q(\mathbf{x}_t \mid \mathbf{x}_{t-1}) = \mathcal{N}(\mathbf{x}_t; \sqrt{1 - \beta_t} \mathbf{x}_{t-1}, \beta_t \mathbf{I})

xt=αˉtx0+1αˉtϵ\mathbf{x}_t = \sqrt{\bar{\alpha}_t}\mathbf{x}_0 + \sqrt{1 - \bar{\alpha}_t}\boldsymbol{\epsilon} whereαt=1βt,αˉt=s=1tαs,ϵN(0,I)\text{where} \quad \alpha_t = 1 - \beta_t, \quad \bar{\alpha}_t = \prod_{s=1}^t \alpha_s, \quad \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})

t=0 T=1000
SNR: ∞

pθ(xt1xt)=N(xt1;μθ(xt,t),σt2I)p_\theta(\mathbf{x}_{t-1} \mid \mathbf{x}_t) = \mathcal{N}(\mathbf{x}_{t-1}; \boldsymbol{\mu}_\theta(\mathbf{x}_t, t), \sigma_t^2 \mathbf{I})

μθ(xt,t)=1αt(xtβt1αˉtϵθ(xt,t))\boldsymbol{\mu}_\theta(\mathbf{x}_t, t) = \frac{1}{\sqrt{\alpha_t}} \left( \mathbf{x}_t - \frac{\beta_t}{\sqrt{1 - \bar{\alpha}_t}} \boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t) \right)

t = 1000

Lsimple(θ)=Et,x0,ϵ[ϵϵθ(αˉtx0+1αˉtϵ,t)2]L_{\text{simple}}(\theta) = \mathbb{E}_{t, \mathbf{x}_0, \boldsymbol{\epsilon}} \left[ \| \boldsymbol{\epsilon} - \boldsymbol{\epsilon}_\theta(\sqrt{\bar{\alpha}_t}\mathbf{x}_0 + \sqrt{1 - \bar{\alpha}_t}\boldsymbol{\epsilon}, t) \|^2 \right]

Ready

Hover to view

Part II: Flow Matching

dxdt=vθ(x,t),t[0,1]\frac{dx}{dt} = v_\theta(x, t), \quad t \in [0, 1]

xt=(1t)x0+tx1x_t = (1 - t) x_0 + t x_1 vtarget(xt,t)=x1x0v_{target}(x_t, t) = x_1 - x_0

L(θ)=Et,x0,x1[vθ(xt,t)(x1x0)2]\mathcal{L}(\theta) = \mathbb{E}_{t, x_0, x_1} \left[ \| v_\theta(x_t, t) - (x_1 - x_0) \|^2 \right]

def train_step(model, x_1): # t ~ U[0, 1] t = torch.rand(B, 1) # x_0 ~ N(0, I) x_0 = torch.randn_like(x_1) # Linear interpolation x_t = (1 - t) * x_0 + t * x_1 # Target velocity is simply the difference v_target = x_1 - x_0 # Predict velocity field v_pred = model(x_t, t) # MSE Loss loss = F.mse_loss(v_pred, v_target) return loss

def sample_euler(model, steps=50, shape=(1, 3, 256, 256)): x = torch.randn(shape) # Start at noise (t=0) dt = 1.0 / steps for i in range(steps): t = torch.full((shape[0],), i * dt) v = model(x, t) # Euler integration step x = x + v * dt return x # Data at t=1

xtΔt=(1(tΔt))signal coeffx^0+(tΔt)cos ⁣(ηπ2)predicted noisex^1+(tΔt)sin ⁣(ηπ2)fresh noiseϵ\bm{x}_{t-\Delta t} = \underbrace{(1-(t-\Delta t))}_{\text{signal coeff}} \hat{\bm{x}}_0 + (t-\Delta t)\underbrace{\cos\!\left(\frac{\eta\pi}{2}\right)}_{\text{predicted noise}} \hat{\bm{x}}_1 + (t-\Delta t)\underbrace{\sin\!\left(\frac{\eta\pi}{2}\right)}_{\text{fresh noise}} \bm{\epsilon}

t=1.0
η=0.70
η = 0.50

Method Stochasticity Noise Artifacts Matches Scheduler
ODE (Euler) None None Yes
Flow-SDE Uncontrolled Severe No (Excess)
CPS Controlled (η) None Yes (Exact)

Reference: Wang & Yu, "Coefficients-Preserving Sampling for Reinforcement Learning with Flow Matching," arXiv:2509.05952, 2025.

ErrorEulermaxtxvθdt2\text{Error}_{\text{Euler}} \propto \max_{t} \| \nabla_x v_\theta \| \cdot dt^2

Feature DDPM / DDIM Flow Matching
Mathematical Framework Stochastic Differential Equations (SDE) Ordinary Differential Equations (ODE)
Training Target Predict Noise ϵ\epsilon or Data x0x_0 Predict Velocity v=x1x0v = x_1 - x_0
Path Shape Curved (Noise Schedule dependent) Straight (Linear Interpolation)
Typical Sampling Steps 20 - 1000 1 - 50 (with Reflow)