How TDM‑R1 Achieves 4‑Step Image Generation that Beats 80‑Step Models

Researchers from HKUST, CUHK and XiaoHongShu introduced TDM‑R1, a reinforcement‑learning‑based method that enables 4‑step diffusion image generation to surpass 80‑step models in speed, fidelity, and complex instruction adherence, as demonstrated on the GenEval benchmark and multiple quality metrics.

SuanNi
SuanNi
SuanNi
How TDM‑R1 Achieves 4‑Step Image Generation that Beats 80‑Step Models

Background and Motivation

Generating high‑quality images and videos remains a core goal in AI research. Recent few‑step diffusion techniques, such as diffusion distillation, have dramatically increased generation speed—up to 50× faster—but still struggle with precise instruction following, complex text rendering, and accurate object placement.

Limitations of Existing Few‑Step Diffusion Models

Current reinforcement‑learning (RL) approaches for few‑step models assume that reward signals are differentiable, requiring the model’s output to be back‑propagated through a reward model. This restriction excludes many real‑world feedback signals—human preferences, object counts, or OCR‑based spelling accuracy—that are inherently non‑differentiable.

TDM‑R1 Architecture and Training Procedure

The research team built on the Trajectory Distribution Matching (TDM) framework and introduced the TDM‑R1 architecture. TDM‑R1 splits learning into two stages: (1) a surrogate reward model learns from deterministic generation trajectories, and (2) the generator is trained using these unbiased intermediate rewards. Deterministic trajectories allow precise reward evaluation for each intermediate sample, avoiding the bias introduced by naïvely applying the final‑step score to all steps.

During each training iteration, the current model generates a batch of condition‑constrained samples. These samples are scored along the trajectory, and the intermediate noisy samples are divided into positive and negative groups. A contrastive mechanism then trains the surrogate reward model to capture fine‑grained preferences.

Experimental Results

The team evaluated TDM‑R1 on the challenging GenEval benchmark, which includes six complex compositional generation tasks (object count, spatial relations, attribute binding, etc.). The 4‑step Z‑Image model enhanced with TDM‑R1 achieved a remarkable 0.92 score, far surpassing the 0.63 of the original 80‑step SD3.5‑M model and the 0.66 of a 100‑step Z‑Image baseline. GPT‑4o scored only 0.84.

Across all sub‑metrics—object counting, color recognition, positional accuracy, and attribute binding—TDM‑R1 consistently outperformed other few‑step models. Moreover, five independent, unseen image‑quality metrics confirmed that the 4‑step model did not sacrifice visual fidelity.

Ablation Studies

To verify the necessity of the surrogate reward model, the researchers compared several baselines. Directly combining standard RL loss with few‑step distillation yielded negligible early improvements but caused severe quality degradation later, as the incompatible objectives conflicted. Training a student model by distilling a pre‑trained RL‑enhanced teacher also led to early performance ceilings.

In contrast, TDM‑R1’s dynamic surrogate reward mechanism continuously absorbed new feedback, resulting in a steadily rising performance curve that far outpaced traditional distillation methods.

Conclusion

TDM‑R1 demonstrates that integrating a dynamic surrogate reward model with deterministic sampling trajectories effectively overcomes the non‑differentiable reward barrier in few‑step diffusion generation. The approach delivers 4‑step image synthesis that matches or exceeds the quality of 80‑step models while preserving fine‑grained instruction compliance, pointing to a viable path for future high‑efficiency AI image generation.

Diffusion ModelsbenchmarkingAI image synthesisfew‑step generation
SuanNi
Written by

SuanNi

A community for AI developers that aggregates large-model development services, models, and compute power.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.