Can One-Step Generative Modeling Beat Multi-Step Diffusion? Inside MeanFlow
The article presents MeanFlow, a novel one‑step generative modeling framework that replaces instantaneous velocity with an average‑velocity field, achieving a record‑low FID of 3.43 on ImageNet 256×256 with a single function evaluation and outperforming both prior single‑step and multi‑step diffusion models.
Paper Overview
MeanFlow is a one‑step generative modeling framework introduced in “Mean Flows for One‑step Generative Modeling”. Pre‑print: https://arxiv.org/pdf/2505.13447v1
Key Idea
The method introduces a ground‑truth field representing the average velocity u over a time interval Δt, instead of the instantaneous velocity v used in conventional flow‑matching. An intrinsic relationship between u and v is derived, providing a principled training signal.
Method
Average velocity is defined as
u(z_t, r, t) = (1/Δt) \int_{t}^{t+Δt} v(z_s, r, s)\,dsFrom the continuity equation and product rule, the authors obtain the MeanFlow identity that links u and v :
∂_t u = v \cdot
abla_r u + \frac{1}{Δt}\bigl(v(z_{t+Δt}) - v(z_t)\bigr)A neural network f_θ is trained to predict u . The loss combines a reconstruction term and a consistency term derived from the identity:
L(θ) = \mathbb{E}_{z_t,r,t}\bigl[\|f_θ(z_t,r,t) - u(z_t,r,t)\|^2
+ λ\|∂_t f_θ - (v \cdot
abla_r f_θ + (v_{t+Δt}-v_t)/Δt)\|^2\bigr]During sampling, a single function evaluation (1‑NFE) integrates the learned average‑velocity field to obtain a data sample.
Algorithm (pseudo‑code)
# Training loop
for each minibatch:
sample (z_t, r, t)
compute instantaneous velocity v via the forward SDE
compute average velocity u by numerical integration over Δt
û = f_θ(z_t, r, t)
loss = ||û - u||^2 + λ * consistency_term
back‑propagate loss
# Sampling (1‑NFE)
z_T ~ N(0, I)
for t from T down to 0 (step size Δt):
û = f_θ(z_t, r, t)
z_{t-Δt} = z_t - û * Δt # Euler update
return z_0Experimental Setup
Evaluations were performed on unconditional generation tasks:
ImageNet 256×256
CIFAR‑10 (32×32)
All models were trained from scratch without any pre‑training, knowledge distillation, or curriculum learning.
Results
ImageNet 256×256
1‑NFE: FID = 3.43 (previous best single‑step FID ≈ 7.77, >50 % improvement).
2‑NFE: FID = 2.20, comparable to multi‑step baselines DiT (FID 2.27) and SiT (FID 2.15) that use 250 × 2 NFE.
CIFAR‑10
Unconditional generation achieves competitive FID scores comparable to prior work (exact numbers are shown in the original tables).
Ablation Study
Removing the MeanFlow identity or the consistency term degrades performance, confirming that each component contributes to the final results.
Sample Outputs
High‑quality samples generated with a single function evaluation are illustrated below.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
