How UNIFIEDREWARD Breaks Task Boundaries to Boost Image and Video Performance
The paper introduces UNIFIEDREWARD, the first unified reward model for multimodal understanding and generation that supports pairwise ranking and pointwise scoring, builds a 236K human‑preference dataset across image and video tasks, and uses DPO to align VLMs and diffusion models, achieving significant performance gains on both image and video benchmarks.
