Can One-Step Generative Modeling Beat Multi-Step Diffusion? Inside MeanFlow

The article presents MeanFlow, a novel one‑step generative modeling framework that replaces instantaneous velocity with an average‑velocity field, achieving a record‑low FID of 3.43 on ImageNet 256×256 with a single function evaluation and outperforming both prior single‑step and multi‑step diffusion models.

AI Frontier Lectures
AI Frontier Lectures
AI Frontier Lectures
Can One-Step Generative Modeling Beat Multi-Step Diffusion? Inside MeanFlow

Paper Overview

MeanFlow is a one‑step generative modeling framework introduced in “Mean Flows for One‑step Generative Modeling”. Pre‑print: https://arxiv.org/pdf/2505.13447v1

Key Idea

The method introduces a ground‑truth field representing the average velocity u over a time interval Δt, instead of the instantaneous velocity v used in conventional flow‑matching. An intrinsic relationship between u and v is derived, providing a principled training signal.

Method

Average velocity is defined as

u(z_t, r, t) = (1/Δt) \int_{t}^{t+Δt} v(z_s, r, s)\,ds

From the continuity equation and product rule, the authors obtain the MeanFlow identity that links u and v :

∂_t u = v \cdot 
abla_r u + \frac{1}{Δt}\bigl(v(z_{t+Δt}) - v(z_t)\bigr)

A neural network f_θ is trained to predict u . The loss combines a reconstruction term and a consistency term derived from the identity:

L(θ) = \mathbb{E}_{z_t,r,t}\bigl[\|f_θ(z_t,r,t) - u(z_t,r,t)\|^2
    + λ\|∂_t f_θ - (v \cdot 
abla_r f_θ + (v_{t+Δt}-v_t)/Δt)\|^2\bigr]

During sampling, a single function evaluation (1‑NFE) integrates the learned average‑velocity field to obtain a data sample.

Algorithm (pseudo‑code)

# Training loop
for each minibatch:
    sample (z_t, r, t)
    compute instantaneous velocity v via the forward SDE
    compute average velocity u by numerical integration over Δt
    û = f_θ(z_t, r, t)
    loss = ||û - u||^2 + λ * consistency_term
    back‑propagate loss

# Sampling (1‑NFE)
z_T ~ N(0, I)
for t from T down to 0 (step size Δt):
    û = f_θ(z_t, r, t)
    z_{t-Δt} = z_t - û * Δt   # Euler update
return z_0

Experimental Setup

Evaluations were performed on unconditional generation tasks:

ImageNet 256×256

CIFAR‑10 (32×32)

All models were trained from scratch without any pre‑training, knowledge distillation, or curriculum learning.

Results

ImageNet 256×256

1‑NFE: FID = 3.43 (previous best single‑step FID ≈ 7.77, >50 % improvement).

2‑NFE: FID = 2.20, comparable to multi‑step baselines DiT (FID 2.27) and SiT (FID 2.15) that use 250 × 2 NFE.

CIFAR‑10

Unconditional generation achieves competitive FID scores comparable to prior work (exact numbers are shown in the original tables).

Ablation Study

Removing the MeanFlow identity or the consistency term degrades performance, confirming that each component contributes to the final results.

Sample Outputs

High‑quality samples generated with a single function evaluation are illustrated below.

MeanFlow equation
MeanFlow equation
ImageNet results table
ImageNet results table
CIFAR‑10 results table
CIFAR‑10 results table
Ablation results table
Ablation results table
Generated samples
Generated samples
flow matchingAI researchFIDImageNetMeanFlowone-step diffusion
AI Frontier Lectures
Written by

AI Frontier Lectures

Leading AI knowledge platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.