Artificial Intelligence 11 min read

Uni-Layout: Unifying Layout Generation with Human Feedback and Dynamic Alignment

Uni-Layout introduces a unified framework that combines a multimodal large language model‑based generator, a human‑like evaluator trained on the large Layout‑HF100k dataset, and a Dynamic Margin Preference Optimization (DMPO) method to align generation and evaluation, achieving state‑of‑the‑art results across diverse layout tasks.

JD Cloud Developers

Jan 15, 2026

Uni-Layout: Unifying Layout Generation with Human Feedback and Dynamic Alignment

Background and Motivation

Layout generation is crucial for designing e‑commerce images, posters, UI, and magazines, but existing methods are task‑specific and evaluated with metrics that often diverge from human perception. Uni-Layout addresses these gaps by providing a unified generator, a human‑simulated evaluator, and an alignment mechanism.

Unified Cross‑Task Layout Generation

We propose a taxonomy based on two dimensions—whether background (b) and element (e) content are free or constrained—resulting in four task types (BFEF, BCEF, BFEC, BCEC). Leveraging multimodal large language models (MLLMs), Uni-Layout implements a single generator that takes natural‑language prompts describing background and element constraints and produces coherent layouts for any of the four types.

Human‑Feedback Dataset (Layout‑HF100k)

To train a human‑like evaluator, we collected Layout‑HF100k, the first large‑scale dataset containing 100,000 layouts annotated by humans as "acceptable" or "unacceptable" across representative tasks. This dataset supplies high‑quality supervision for modeling human aesthetic judgments.

Human‑Like Evaluator with Chain‑of‑Thought Reasoning

The evaluator processes layouts through two branches: visual features and geometric information. It includes a confidence‑estimation head for quantitative scores and a chain‑of‑thought (CoT) module that performs four reasoning steps:

Layout overview: generate a concise textual description of the overall composition.

Spatial deconstruction: analyze geometric properties, alignment patterns, and spacing consistency.

Aesthetic assessment: evaluate visual quality, balance, harmony, and rhythm.

Comprehensive judgment: combine previous insights to output a binary "acceptable"/"unacceptable" decision.

Dynamic Margin Preference Optimization (DMPO)

Traditional alignment methods treat all human preferences equally, ignoring preference strength. DMPO adapts the margin between paired layout scores based on the evaluator’s confidence: stronger preferences receive larger margins, weaker ones receive smaller margins. This confidence‑guided adaptive margin better captures the range of human judgments.

During inference, the generator produces two candidate layouts (l1, l2). The evaluator computes visual (I) and geometric (l) scores for each, and the score difference Δ is transformed by a nonlinear function f() before applying the DMPO loss:

Loss = f(Δ) + ...

Experimental Results

Evaluator Performance

Compared with leading closed‑source MLLMs (GPT‑4o, Claude‑3.5 Sonnet, GLM‑4v, DeepSeek‑R1) under an LLM‑as‑Judge protocol, our evaluator achieves 85.5% accuracy, surpassing others by 25‑35% and far exceeding random‑level baselines.

Generator Performance

We benchmark against task‑specific SOTA models (e.g., LayoutDM), closed‑source models (GPT‑4o, Claude‑3.5, DeepSeek‑R1), and open‑source MLLMs (LLaVA). Uni‑Layout consistently attains the best scores across metrics such as Ove, Ali, Max., Rcom, and Rsub for all four task types, often setting new records.

Human‑Simulation Evaluation

Using the LR score to measure alignment with human judgments, Uni‑Layout reaches 0.702, outperforming GPT‑4o (0.584), Claude‑3.5 (0.575), DeepSeek‑R1 (0.401), and LLaVA (0.422) by a large margin, and exceeds the average of specialized baselines (0.658).

Conclusion

Uni‑Layout demonstrates that a unified generator, a human‑like evaluator trained on a large feedback dataset, and the DMPO alignment technique together produce visually appealing layouts that align closely with human preferences across diverse design tasks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

evaluation layout generation Multimodal LLM Human Feedback DMPO

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.