Data Party THU
Oct 13, 2025 · Artificial Intelligence
How BranchGRPO Accelerates and Stabilizes Diffusion Model Alignment
BranchGRPO introduces a tree‑structured branching, reward‑fusion, and lightweight pruning framework that dramatically speeds up diffusion and flow model training while delivering denser, more stable reward signals, achieving up to five‑fold faster convergence and higher alignment scores on image and video generation benchmarks.
BranchGRPORLHFdiffusion models
0 likes · 10 min read
