Artificial Intelligence 10 min read

Multidimensional Preference Model (MPS) for Text-to-Image Generation: Dataset, Architecture, and Experimental Analysis

This article introduces the Multidimensional Preference Model (MPS), the first multi‑dimensional scoring system for evaluating text‑to‑image generation, built on the newly released MHP dataset with extensive human annotations across aesthetic, semantic alignment, detail quality, and overall preference dimensions, and demonstrates its superior performance through comprehensive experiments and RLHF integration.

Kuaishou Tech

Jul 18, 2024

Multidimensional Preference Model (MPS) for Text-to-Image Generation: Dataset, Architecture, and Experimental Analysis

We propose the Multidimensional Preference Model (MPS), the first multi‑dimensional scoring model for evaluating text‑to‑image generation, trained on the newly released Multidimensional Human Preference (MHP) dataset containing 918,315 pairwise comparisons across aesthetic, semantic alignment, detail quality, and overall score dimensions.

MHP was built from a balanced set of prompts collected from multiple sources, augmented with GPT‑4 generated prompts to address long‑tail categories, and paired with images generated by diffusion, GAN, and autoregressive models, resulting in over 600k images and extensive human annotations.

The MPS architecture extends CLIP with a preference‑condition module that injects a conditional mask into the cross‑attention layers, allowing the model to predict scores for each preference dimension while sharing a unified backbone.

Extensive experiments on three public benchmarks and our MHP benchmark show that MPS outperforms existing scoring methods on overall and per‑dimension metrics, and visualizations using Grad‑CAM demonstrate that the conditional mask focuses on relevant prompt tokens.

We also integrate MPS into reinforcement learning from human feedback (RLHF) pipelines (e.g., PPO, DPO) to fine‑tune large text‑to‑image models, improving aesthetic quality and realism. The model, dataset, and code are publicly released.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

text-to-image RLHF MPS MHP dataset multidimensional preference

Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.