How Uni-Layout Unifies Cross‑Task Layout Generation with Human‑Like Evaluation

Uni-Layout introduces a unified framework that integrates a universal layout generator, a human‑feedback‑simulating evaluator, and a dynamic margin preference optimization technique to align generation and evaluation across diverse e‑commerce design tasks, backed by a new 100k human‑annotated dataset.

JD Tech
JD Tech
JD Tech
How Uni-Layout Unifies Cross‑Task Layout Generation with Human‑Like Evaluation

Background

Layout generation is essential for e‑commerce image design, but existing methods are task‑specific and evaluated with metrics that often do not match human perception. This limits their applicability.

Unified Generation (Uni‑Layout)

Uni‑Layout defines a unified classification of layout tasks using two binary dimensions: background (B) and element (E) content can be free (F) or constrained (C). This yields four representative task types: BFEF , BCEF , BFEC , and BCEC . A multimodal large language model (MLLM) is trained as a universal generator that accepts natural‑language prompts describing background and element constraints and produces coherent layouts for both free and constrained scenarios via joint training.

Human‑Feedback Dataset (Layout‑HF100k)

To capture human judgments, the authors constructed Layout‑HF100k , a dataset of 100,000 layouts annotated with qualitative feedback across the four task types. Each entry contains a layout, the corresponding prompt, and human‑rated acceptability.

Human‑Like Evaluator

The evaluator processes a layout through two parallel branches:

Visual branch extracts image‑level features.

Geometric branch encodes spatial relationships of elements.

It outputs a confidence score and a chain‑of‑thought (CoT) explanation consisting of four steps:

Layout Overview – brief textual summary of the composition.

Spatial Deconstruction – analysis of alignment, spacing, and overlap.

Aesthetic Assessment – evaluation of balance, harmony, and visual rhythm.

Comprehensive Judgment – final decision “acceptable” or “unacceptable”.

Dynamic Margin Preference Optimization (DMPO)

Traditional alignment treats all human preferences equally. DMPO adapts the margin between paired layout scores based on the strength of the preference: stronger preferences receive larger margins, weaker preferences smaller margins. For two candidate layouts l1 and l2, the evaluator produces visual embeddings I⁺ and geometric embeddings l⁺. The score difference Δ = f(I⁺, l⁺) is transformed by a nonlinear function f(), and the DMPO loss encourages larger gaps for strongly preferred pairs.

Training Objective

The overall loss combines the DMPO term with standard cross‑entropy for the acceptability label, enabling the generator and evaluator to be jointly optimized.

Experimental Results

Evaluator performance : Compared with closed‑source MLLMs (GPT‑4o, Claude‑3.5, GLM‑4v, DeepSeek‑R1) using an LLM‑as‑Judge protocol, Uni‑Layout’s evaluator achieves 85.5% accuracy, 25–35% higher than baselines.

Generator performance : Across the four task types, Uni‑Layout outperforms task‑specific SOTA models (e.g., LayoutDM) and open‑source MLLMs on metrics such as Ove, Ali, Max, R_{com}, and R_{sub}. Human‑simulation LR scores are also higher (0.702) than GPT‑4o (0.584) and LLaVA (0.422).

These results demonstrate that a unified generator, a human‑like evaluator, and adaptive DMPO alignment together bridge the gap between automated layout synthesis and human aesthetic preferences, delivering higher‑quality designs for diverse e‑commerce scenarios.

evaluationlayout generationmultimodal LLMHuman Feedbacke-commerce designdynamic margin optimization
JD Tech
Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.