Tagged articles

one-step text-to-image

1 articles · Page 1 of 1
Machine Heart
Machine Heart
Jun 20, 2026 · Artificial Intelligence

DrPO: Ranking‑Only Rewards Boost One‑Step Text‑to‑Image Preference Optimization by 3.51×

DrPO introduces a ranking‑only reward that builds a drift field from on‑policy image samples to fine‑tune one‑step text‑to‑image models, achieving up to 3.51× faster training on large multimodal rewards, supporting non‑differentiable signals, and demonstrating superior quality across multiple benchmarks.

Drifting Preference Optimizationdrift fieldnon-differentiable reward
0 likes · 14 min read
DrPO: Ranking‑Only Rewards Boost One‑Step Text‑to‑Image Preference Optimization by 3.51×