Multidimensional Preference Model (MPS) for Text-to-Image Generation: Dataset, Architecture, and Experimental Analysis
This article introduces the Multidimensional Preference Model (MPS), the first multi‑dimensional scoring system for evaluating text‑to‑image generation, built on the newly released MHP dataset with extensive human annotations across aesthetic, semantic alignment, detail quality, and overall preference dimensions, and demonstrates its superior performance through comprehensive experiments and RLHF integration.
We propose the Multidimensional Preference Model (MPS), the first multi‑dimensional scoring model for evaluating text‑to‑image generation, trained on the newly released Multidimensional Human Preference (MHP) dataset containing 918,315 pairwise comparisons across aesthetic, semantic alignment, detail quality, and overall score dimensions.
MHP was built from a balanced set of prompts collected from multiple sources, augmented with GPT‑4 generated prompts to address long‑tail categories, and paired with images generated by diffusion, GAN, and autoregressive models, resulting in over 600k images and extensive human annotations.
The MPS architecture extends CLIP with a preference‑condition module that injects a conditional mask into the cross‑attention layers, allowing the model to predict scores for each preference dimension while sharing a unified backbone.
Extensive experiments on three public benchmarks and our MHP benchmark show that MPS outperforms existing scoring methods on overall and per‑dimension metrics, and visualizations using Grad‑CAM demonstrate that the conditional mask focuses on relevant prompt tokens.
We also integrate MPS into reinforcement learning from human feedback (RLHF) pipelines (e.g., PPO, DPO) to fine‑tune large text‑to‑image models, improving aesthetic quality and realism. The model, dataset, and code are publicly released.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.