BeautyGRPO: RL‑Driven Realistic Portrait Retouching Ends Over‑Beautification (CVPR 2026)
The paper introduces BeautyGRPO, a reinforcement‑learning framework that combines a fine‑grained preference dataset (FRPref‑10K) with Dynamic Path Guidance to balance aesthetic enhancement and high‑fidelity preservation in portrait retouching, achieving superior metrics and user preference over existing SFT and RL models.
Background
Human Preference Alignment has become a core challenge in computer‑vision and generative‑AI research. High‑frequency demand for high‑quality portrait retouching requires models to remove blemishes while preserving authentic skin texture, pores, and identity‑defining features—a tension between "high fidelity" and "subjective aesthetic preference".
Industry Pain Point
Most current portrait‑retouch models (e.g., RetouchFormer) and general image‑editing large models (e.g., NanoBanana) rely on supervised learning and supervised fine‑tuning (SFT). Their pixel‑level loss forces the model to mimic possibly flawed reference images, leading to over‑smooth, plastic‑looking results.
Online reinforcement learning (RL) approaches such as FlowGRPO convert deterministic ODEs to stochastic SDEs to encourage aesthetic exploration, but the injected stochastic drift accumulates noise and produces visible artifacts on faces.
Core Breakthroughs
1. Fine‑Grained Preference Modeling (FRPref‑10K)
The authors built a 10,000‑pair high‑resolution preference dataset (FRPref‑10K) that annotates five dimensions: skin smoothness, blemish removal, texture fidelity, clarity, and identity preservation. Using this dataset, they trained a multi‑dimensional reward model based on the GRPO algorithm and a vision‑language model (CoT reasoning), enabling the model to distinguish subtle trade‑offs such as preserving natural pores while smoothing acne.
2. Dynamic Path Guidance (DPG)
DPG introduces a flexible "anchor‑constraint" that plans a deterministic trajectory toward a high‑quality reference anchor at each sampling step. The correction vector is linearly mixed with standard Gaussian noise, with the mixing ratio changing over time:
Early sampling (high‑noise stage): The correction vector receives higher weight, pulling the trajectory back to the high‑fidelity manifold and preserving facial structure.
Late sampling (detail generation stage): The weight of the correction vector decreases, allowing controlled stochastic exploration to improve aesthetic texture.
DPG keeps the exploration trajectory near the high‑fidelity manifold, achieving a balance between realism and aesthetic improvement.
Experimental Evaluation
Tests on the FFHQ‑R and In‑the‑Wild datasets show that BeautyGRPO outperforms existing professional retouching models and general editing LLMs on a suite of no‑reference perceptual metrics (NIMA, MUSIQ, MANIQA, TOPIQ) and retains identity with ArcFace scores above 0.95.
Visual Comparison
Baseline models either miss blemishes or produce an artificial "silicone‑skin" effect, while BeautyGRPO removes dark spots and acne while preserving natural skin texture, pores, and personal features such as moles.
User Preference Blind Test
A double‑blind study with 100 participants of varying ages and editing experience gave BeautyGRPO a 63.25% win rate, far ahead of the second‑best method at 12.00%.
Generalization to Large‑Scale Editing Models
The framework was also applied to the Qwen‑Image‑Edit large model. After integration, the model showed reduced identity drift and less over‑smoothing on facial regions, demonstrating plug‑and‑play generalization.
Conclusion
BeautyGRPO moves portrait retouching from pixel‑level supervision to RL‑driven aesthetic alignment, preserving fine‑grained realism while achieving higher aesthetic scores. As a CVPR 2026 accepted work, it signals a shift in mobile‑side computational photography toward algorithms that respect both natural skin texture and personal identity.
AIWalker
Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
