Uni-Layout: Unifying Layout Generation with Human Feedback and Dynamic Alignment
Uni-Layout introduces a unified framework that combines a multimodal large language model‑based generator, a human‑like evaluator trained on the large Layout‑HF100k dataset, and a Dynamic Margin Preference Optimization (DMPO) method to align generation and evaluation, achieving state‑of‑the‑art results across diverse layout tasks.
Background and Motivation
Layout generation is crucial for designing e‑commerce images, posters, UI, and magazines, but existing methods are task‑specific and evaluated with metrics that often diverge from human perception. Uni-Layout addresses these gaps by providing a unified generator, a human‑simulated evaluator, and an alignment mechanism.
Unified Cross‑Task Layout Generation
We propose a taxonomy based on two dimensions—whether background (b) and element (e) content are free or constrained—resulting in four task types (BFEF, BCEF, BFEC, BCEC). Leveraging multimodal large language models (MLLMs), Uni-Layout implements a single generator that takes natural‑language prompts describing background and element constraints and produces coherent layouts for any of the four types.
Human‑Feedback Dataset (Layout‑HF100k)
To train a human‑like evaluator, we collected Layout‑HF100k, the first large‑scale dataset containing 100,000 layouts annotated by humans as "acceptable" or "unacceptable" across representative tasks. This dataset supplies high‑quality supervision for modeling human aesthetic judgments.
Human‑Like Evaluator with Chain‑of‑Thought Reasoning
The evaluator processes layouts through two branches: visual features and geometric information. It includes a confidence‑estimation head for quantitative scores and a chain‑of‑thought (CoT) module that performs four reasoning steps:
Layout overview: generate a concise textual description of the overall composition.
Spatial deconstruction: analyze geometric properties, alignment patterns, and spacing consistency.
Aesthetic assessment: evaluate visual quality, balance, harmony, and rhythm.
Comprehensive judgment: combine previous insights to output a binary "acceptable"/"unacceptable" decision.
Dynamic Margin Preference Optimization (DMPO)
Traditional alignment methods treat all human preferences equally, ignoring preference strength. DMPO adapts the margin between paired layout scores based on the evaluator’s confidence: stronger preferences receive larger margins, weaker ones receive smaller margins. This confidence‑guided adaptive margin better captures the range of human judgments.
During inference, the generator produces two candidate layouts (l1, l2). The evaluator computes visual (I) and geometric (l) scores for each, and the score difference Δ is transformed by a nonlinear function f() before applying the DMPO loss:
Loss = f(Δ) + ...Experimental Results
Evaluator Performance
Compared with leading closed‑source MLLMs (GPT‑4o, Claude‑3.5 Sonnet, GLM‑4v, DeepSeek‑R1) under an LLM‑as‑Judge protocol, our evaluator achieves 85.5% accuracy, surpassing others by 25‑35% and far exceeding random‑level baselines.
Generator Performance
We benchmark against task‑specific SOTA models (e.g., LayoutDM), closed‑source models (GPT‑4o, Claude‑3.5, DeepSeek‑R1), and open‑source MLLMs (LLaVA). Uni‑Layout consistently attains the best scores across metrics such as Ove, Ali, Max., Rcom, and Rsub for all four task types, often setting new records.
Human‑Simulation Evaluation
Using the LR score to measure alignment with human judgments, Uni‑Layout reaches 0.702, outperforming GPT‑4o (0.584), Claude‑3.5 (0.575), DeepSeek‑R1 (0.401), and LLaVA (0.422) by a large margin, and exceeds the average of specialized baselines (0.658).
Conclusion
Uni‑Layout demonstrates that a unified generator, a human‑like evaluator trained on a large feedback dataset, and the DMPO alignment technique together produce visually appealing layouts that align closely with human preferences across diverse design tasks.
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
