Machine Heart
May 25, 2026 · Artificial Intelligence
Breaking the Reward Trade‑off: Flow‑OPD Brings Multi‑Teacher OPD to Image Generation
Flow‑OPD introduces on‑policy distillation into flow‑matching diffusion models, using a multi‑teacher online rollout framework and manifold‑anchor regularization to resolve the seesaw effect of single and mixed rewards, achieving superior multi‑task performance and surpassing specialist models in image generation.
Flow-OPDManifold Anchor Regularizationdiffusion models
0 likes · 9 min read
