May 25, 2026 · Artificial Intelligence

Breaking the Reward Trade‑off: Flow‑OPD Brings Multi‑Teacher OPD to Image Generation

Flow‑OPD introduces on‑policy distillation into flow‑matching diffusion models, using a multi‑teacher online rollout framework and manifold‑anchor regularization to resolve the seesaw effect of single and mixed rewards, achieving superior multi‑task performance and surpassing specialist models in image generation.

Flow-OPDManifold Anchor Regularizationdiffusion models

0 likes · 9 min read

Breaking the Reward Trade‑off: Flow‑OPD Brings Multi‑Teacher OPD to Image Generation