Tagged articles
1 articles
Page 1 of 1
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Jun 21, 2026 · Artificial Intelligence

Rank‑Only Rewards Accelerate One‑Step Text‑to‑Image Preference Optimization 3.5×

DrPO introduces a drifting‑field based, rank‑only reward mechanism for one‑step text‑to‑image models, enabling reinforcement‑learning‑after‑training without back‑propagating reward gradients; it speeds up training 3.51× versus DRaFT, works with non‑differentiable rewards, and improves generation quality on SD‑Turbo and SDXL‑Turbo.

DrPODrifting ModelHPSv3
0 likes · 11 min read
Rank‑Only Rewards Accelerate One‑Step Text‑to‑Image Preference Optimization 3.5×