Machine Learning Algorithms & Natural Language Processing
Jun 21, 2026 · Artificial Intelligence
Rank‑Only Rewards Accelerate One‑Step Text‑to‑Image Preference Optimization 3.5×
DrPO introduces a drifting‑field based, rank‑only reward mechanism for one‑step text‑to‑image models, enabling reinforcement‑learning‑after‑training without back‑propagating reward gradients; it speeds up training 3.51× versus DRaFT, works with non‑differentiable rewards, and improves generation quality on SD‑Turbo and SDXL‑Turbo.
DrPODrifting ModelHPSv3
0 likes · 11 min read
