Content-aware Automatic Graphic Layout Generation
The paper introduces a perception‑driven automatic graphic layout system that aligns advertising creatives with product images via a domain‑alignment module and generates content‑aware layouts using a multi‑scale CNN‑Transformer generator, achieving superior aesthetic quality and lower element overlap compared with existing template‑based and deep‑learning methods.
Background : In online advertising, the visual appeal of creative layouts strongly influences click-through rates. Traditional automatic creative generation relies on fixed templates, which often cause element overlap with product images and lead to visual fatigue. Existing layout generation research focuses on internal relationships among graphic elements without fully leveraging image content.
Motivation : To address these issues, we propose a perception-driven layout generation method that adapts to the content of product images, ensuring effective presentation of the main subject and improved aesthetic quality.
Related Work : Early automatic layout methods use templates or heuristics, limiting flexibility. Recent deep‑learning approaches such as LayoutGAN, LayoutVAE, and VTN generate layouts data‑driven but ignore image content. ContentGAN incorporates visual semantics via a global feature vector, yet lacks spatial detail, leading to subject occlusion in advertising scenarios.
Method Design : We define a layout as a variable‑length set of elements (logo, text, background, decorative). Two core challenges are addressed: (1) obtaining paired image‑layout data, solved by a Domain Alignment Module (DAM) that aligns advertiser creative images with clean product images using inpainting (LAMA) and saliency detection; (2) exploiting image content during generation, solved by a Multi‑scale CNN backbone combined with a Transformer‑based generator. The generator encodes multi‑scale image features, decodes them with cross‑attention to layout tokens, and predicts element classes and coordinates via fully‑connected heads. User constraints can be incorporated to complete partial layouts.
Losses and Optimization : The total loss comprises reconstruction loss (cross‑entropy for classification and regression, similar to DETR) and adversarial loss with a discriminator. A differentiable argmax aligns predicted and ground‑truth boxes.
Experiments : We built a dataset of ~60k annotated creative images and 1k clean product images. Evaluation includes traditional overlap/alignment metrics, three novel content‑aware metrics, and human assessments. Our method outperforms SOTA baselines on all metrics, especially in human rating and content‑related scores. Qualitative results show adaptive layouts under image cropping/scaling and effective user‑constrained generation.
Paper : "Composition-aware Graphic Layout GAN for Visual‑textual Presentation Designs" (IJCAI 2022 AI & Arts Track). Download: https://arxiv.org/abs/2205.00303.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.