Uni-Layout: Harnessing Human Feedback for Unified Layout Generation and Evaluation

Uni-Layout introduces a unified framework that generates layouts across diverse tasks, simulates human evaluation with a novel feedback dataset, and aligns generation and assessment through dynamic margin preference optimization, achieving state‑of‑the‑art performance on multiple benchmarks.

JD Tech Talk
JD Tech Talk
JD Tech Talk
Uni-Layout: Harnessing Human Feedback for Unified Layout Generation and Evaluation

Background and Motivation

Layout generation is crucial for designing e‑commerce images, posters, UI screens, and magazine pages, but existing methods are task‑specific and evaluated with metrics that often diverge from human perception. This gap limits their applicability and leads to poor visual quality despite high scores.

Unified Cross‑Task Layout Generation

Uni‑Layout proposes a taxonomy based on two dimensions—whether background (b) and element (e) content are free or constrained—resulting in four representative task types (BFEF, BCEF, BFEC, BCEC). Leveraging multimodal large language models (MLLMs), a single unified generator processes natural‑language prompts together with visual constraints to produce coherent layouts for any of these scenarios.

A generic layout instruction can be written as T(b, e) → O, where T is the task description, b and e denote background and element attributes (which may be empty), and O specifies the output format. An example for a BCEC task is illustrated in the accompanying figure.

Human‑Feedback‑Driven Evaluation

To address the lack of human‑centric evaluation data, the authors compiled Layout‑HF100k , the first large‑scale dataset containing 100,000 human‑annotated layouts covering representative tasks. Using this dataset, they built a simulator evaluator that fuses visual and geometric information and employs a chain‑of‑thought (CoT) reasoning process:

Layout overview – quick textual scan of the visual result.

Spatial deconstruction – analysis of geometric properties and alignment.

Aesthetic assessment – detailed evaluation of visual quality and design principles.

Comprehensive judgment – final “qualified” or “unqualified” decision with confidence estimation.

The evaluator outputs both a qualitative explanation and a quantitative confidence score, closely mirroring human judgment.

Dynamic Margin Preference Optimization (DMPO)

Traditional alignment methods treat all human preferences equally, ignoring varying preference strengths. DMPO adapts the margin between winning and losing layout scores based on the evaluator’s confidence: stronger preferences receive larger margins, while weaker ones receive smaller margins. This adaptive strategy better captures the spectrum of human judgments.

During inference, the generator produces two candidate layouts l1 and l2. The dual‑branch evaluator extracts visual ( I+) and geometric ( l+) features, computes scores, and applies a nonlinear transformation f() to the score difference. The DMPO loss is formulated accordingly, encouraging the generator to align with human‑perceived quality.

Experimental Results

Layout Evaluation Model : Compared against closed‑source LLM‑as‑judge models (GPT‑4o, Claude‑3.5 Sonnet, GLM‑4v, DeepSeek‑R1), Uni‑Layout’s evaluator achieved 85.5% accuracy, surpassing the next best by 25–35% and far outpacing models that performed near random.

Layout Generation Model : Benchmarked against task‑specific SOTA models (e.g., LayoutDM), closed‑source LLMs, and open‑source multimodal LLMs (LLaVA). Uni‑Layout consistently achieved the lowest error metrics (Ove, Ali) and the highest recall/completeness scores across BFEF, BFEC, BCEF, and BCEC tasks.

Human Simulation Evaluation : Using the LR score to measure alignment with human judgments, Uni‑Layout attained the highest LR of 0.702, beating GPT‑4o (0.584), Claude‑3.5 (0.575), DeepSeek‑R1 (0.401), and LLaVA (0.422) by substantial margins, and also outperforming specialized baselines (LayoutFlow, P&R, Poster‑Llama) with an average LR of 0.658.

Overall, the integration of a unified generator, a human‑feedback‑driven evaluator, and DMPO alignment demonstrates a significant advance in producing visually appealing, human‑aligned layouts across diverse design tasks.

evaluationlayout generationmultimodal LLMAI designHuman Feedback
JD Tech Talk
Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.