Artificial Intelligence 27 min read

Meituan’s End‑to‑End AIGC Poster Generation: The Generate‑Edit‑Judge System

The article details Meituan’s AIGC poster‑creation pipeline, which tackles design‑resource scarcity, rapid turnaround, and quality control by integrating three open‑source models—PosterCraft for end‑to‑end generation, PosterOmni for unified multi‑task editing, and PosterReward for automated quality assessment—forming a self‑evolving generate‑edit‑judge loop.

Meituan Technology Team

Jun 18, 2026

Meituan’s End‑to‑End AIGC Poster Generation: The Generate‑Edit‑Judge System

Background and Challenges

Generating a commercial poster traditionally requires half‑day design work or costly outsourcing, while millions of small merchants need minute‑level delivery and consistent quality at scale. The key challenges are (1) precise text rendering with zero tolerance for errors, (2) harmonious layout respecting design principles such as contrast, repetition, alignment and proximity, (3) unified aesthetic style across diverse domains (food, beauty, technology), (4) multi‑task support for local editing and global composition, and (5) quantifiable quality evaluation beyond generic image metrics.

Technical System Overview

Meituan’s intelligent creation team built a closed‑loop system composed of three core modules: a generation model (PosterCraft), a unified multi‑task editing model (PosterOmni), and a reward/evaluation model (PosterReward). The three components are mutually supportive: the reward model drives generation improvement, the generation model expands the editing frontier, and the editing model feeds back refined data to the evaluator.

PosterCraft (ICLR 2026)

PosterCraft abandons the traditional modular pipeline and optimises text, visual content, and layout jointly in an end‑to‑end diffusion framework. Training follows a four‑stage cascade:

Stage 1 – Large‑scale text‑render optimisation: a 2 M sample dataset (Text‑Render‑2M) is fine‑tuned with Flow Matching, dramatically reducing missing or garbled characters.

Stage 2 – High‑quality poster fine‑tuning & region‑aware calibration: HQ‑Poster‑100K provides >100 K curated posters; region weights (non‑text 1.0, primary text 0.6, secondary text 0.2) bias the loss toward visual aesthetics while preserving text fidelity.

Stage 3 – Aesthetic‑text preference learning: Poster‑Preference‑100K supplies 5‑image sets per prompt, scored by HPSv2 and verified by Gemini; best‑of‑N preference optimisation (DPO) teaches colour harmony and balanced composition.

Stage 4 – Visual‑language feedback refinement: Poster‑Reflect‑120K generates six variants per prompt, Gemini selects the best and provides structured feedback; InternVL‑3‑8B is fine‑tuned as a VLM critic to supply iterative refinement during inference.

On text‑recall, F‑score and accuracy, PosterCraft surpasses all open‑source baselines and approaches the performance of top‑tier commercial systems such as Gemini 2.0‑Flash‑Gen.

PosterOmni (CVPR 2026)

PosterOmni treats image‑to‑poster creation as a unified task covering six design operations: Extending/Filling, Rescaling, ID‑driven editing, Layout‑driven, Style‑driven, and reference‑based generation. The workflow consists of four stages:

Stage 1 – Automated data construction (PosterOmni‑200K): prompts are expanded into structured descriptions, candidate images are generated with strong T2I models (e.g., Qwen‑Image), and multi‑modal filters (PaddleOCR, jina‑clip‑v2) remove noisy samples. The resulting dataset contains >20 K high‑quality image‑poster pairs across six thematic domains.

Stage 2 – Expert distillation: separate expert models are trained for local editing (Extending/Filling/Rescaling/ID‑driven) and global composition (Layout‑driven/Style‑driven). Knowledge is distilled into a single student model (PosterOmni‑SFT) using a combined loss: L_total = L_text_render + λ·L_distill Stage 3 – Unified reward training: a preference dataset (PosterOmni‑Preference‑70K) is built by generating multiple candidates per prompt, filtering with Gemini‑2.5‑Pro, and applying a negative‑pair strategy that treats the unchanged reference as a rejected sample and the edited output as chosen. The reward model (Qwen3‑VL encoder + MLP head) is trained with a Bradley‑Terry ranking loss.

Stage 4 – Omni‑Edit reinforcement learning: DiffusionNFT‑style RL optimises the diffusion process using the task‑aware reward, encouraging both aesthetic quality and task compliance.

PosterOmni achieves the best open‑source scores on the PosterOmni‑Bench (1 020 bilingual instructions, six tasks, six poster themes) and outperforms several closed‑source baselines, especially on layout‑driven and style‑driven tasks where it learns genuine design rules rather than copying reference elements.

PosterReward (CVPR 2026)

PosterReward is the first reward model dedicated to poster quality. It combines three evaluation dimensions: structured layout analysis, colour‑scheme assessment, and atmosphere‑style recognition. The system first performs a marketing‑poster structural parsing (12 element classes, >90 % accuracy) and then feeds the element coordinates to a CNN that predicts a 5‑point composition score (MAE 0.38, 90 % of predictions within 1 point). Colour‑scheme identification reaches 96.2 % accuracy over 11 colour families, and style classification attains 91.5 % accuracy across 12 common poster styles.

For generated posters, PosterReward is trained on 70 K preference pairs (7 × 10⁴) covering text rendering, layout, aesthetics and instruction compliance. Training follows four stages: joint supervised fine‑tuning on single‑image and pairwise data, joint rejection‑sampling fine‑tuning with Gemini‑2.5‑Flash‑Lite, score‑module training with Bradley‑Terry loss, and final reinforcement learning (GRPO) using the frozen score module as reward.

On the PosterRewardBench‑Advanced benchmark, PosterReward achieves 86.0 % accuracy, far above existing baselines (40‑53 %). The model also provides pairwise judgments with reduced positional bias, demonstrating stable preference prediction.

Closed‑Loop Interaction

The three modules form a self‑evolving post‑training system: PosterCraft supplies high‑quality generation data, PosterOmni expands capabilities to multi‑task editing while inheriting the reward signal, and PosterReward continuously evaluates both generated and edited outputs, feeding back gradients that improve the upstream models.

Practical Deployments

Meituan has integrated PosterCraft into its “text‑to‑post” feature for automatic vertical poster creation, and collaborated with designers to produce brand‑IP visuals (e.g., Kangaroo‑Team posters). PosterOmni is used for product‑poster generation where the subject identity is preserved while the layout and style are adapted to new contexts.

Conclusion and Outlook

By unifying generation, editing, and evaluation, Meituan’s AIGC system demonstrates that end‑to‑end poster creation can achieve commercial‑grade quality at scale. Future work will focus on finer controllability, broader scenario coverage (including dynamic visual content), deeper evaluation dimensions, and tighter integration of design standards into the reinforcement learning loop.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Open Source multi-task learning AIGC Poster Generation Quality Evaluation Meituan

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.