Artificial Intelligence 18 min read

How AI Diffusion Models Revolutionize E‑commerce Ad Image Creation

This article presents JD Advertising's 2023 innovations that combine relation‑aware diffusion models, category‑aware background generation, and planning‑and‑rendering pipelines to automatically produce high‑quality, scalable, and personalized e‑commerce ad posters, addressing efficiency, cost, and creative limitations of manual design.

JD Cloud Developers

Apr 25, 2024

How AI Diffusion Models Revolutionize E‑commerce Ad Image Creation

Introduction

E‑commerce advertising images need to capture attention, convey brand values, and build emotional connections, but traditional manual creation is inefficient and costly. Recent advances in AIGC still lack point‑of‑sale information, scalability, personalization, and effective presentation. JD Advertising proposes a series of innovations in 2023 to address these challenges.

Poster Layout Generation with a Relation‑Aware Diffusion Model

Technical Background

Generating poster layouts involves predicting the positions and categories of visual elements, which is crucial for aesthetic appeal and information delivery. Manual design is time‑consuming and expensive, prompting research into automatic layout generation.

Early methods focused only on graphic relationships, ignoring visual content, while later content‑aware methods still missed two key factors: the role of text and the geometric relationships among elements.

The proposed relation‑aware diffusion model jointly considers visual‑textual and geometric relationships. By following a noise‑to‑layout paradigm, the model iteratively denoises sampled boxes, extracts RoI features via an image encoder, and employs a Visual‑Text Relation Awareness Module (VTRAM) and a Geometric Relation Awareness Module (GRAM) to incorporate both modalities.

Diffusion‑Based Layout Generation

The diffusion process adds Gaussian noise to a layout, while the denoising process gradually restores a coherent layout, enabling controllable generation through predefined layouts or text modifications.

Visual‑Text Relation Awareness (VTRAM)

VTRAM aligns visual and textual features by concatenating positional embeddings with RoI features, then applying cross‑attention where visual features serve as queries and textual features as keys and values, producing multimodal fused features.

Geometric Relation Awareness (GRAM)

GRAM computes relative position features between RoIs, encodes them with sinusoidal embeddings, and normalizes geometric weights via softmax to enhance spatial understanding. Different element types receive distinct positioning strategies, and RoI features are projected to combine visual and categorical information.

Category‑Common and Personalized Style Background Generation

Technical Background

Product advertising background generation aims to create realistic backgrounds for product cut‑out images, improving click‑through rates. Existing methods fall into text‑to‑image ("text‑to‑image") and image‑to‑image ("image‑to‑image") paradigms, each with limitations such as prompt engineering difficulty and loss of layout details.

The proposed method generates backgrounds that inherit layout, composition, color, and style from a reference advertisement image, using a pre‑trained Stable Diffusion model, a Category‑Common Generator (CG), and a Personalized Generator (PG).

Category‑Common Generation

CG extracts product information from the cut‑out image and generates a generic background for the product’s category. It replaces the standard attention module with a mask‑aware attention that incorporates the product mask, enabling direct mapping from category names to style prompts.

Personalized Style Generation

PG overlays personalized information from a reference image onto the generic background without requiring textual prompts. PG’s output is filtered by the product mask to ensure style influences only the background region.

End‑to‑End Product Poster Generation via Planning and Rendering

Technical Background

High‑quality product posters require coherent element layout and harmonious backgrounds. Existing pipelines that simply combine image‑inpainting and layout generation suffer from background complexity and limited layout diversity.

The proposed solution mimics human designers by separating planning (layout prediction) and rendering (image synthesis).

Layout Generation with a Planning Network

PlanNet encodes product images and textual descriptions, then uses a Layout Decoder (two fully‑connected layers and N transformer blocks) to predict positions for the product and other visual elements.

Background Generation with a Rendering Network

RenderNet receives the planned layout and product image, encodes layout masks, fuses spatial information via a Spatial Fusion Module, and feeds combined visual and layout features into ControlNet to guide Stable Diffusion, producing the final poster.

Conclusion and Outlook

Technical Summary

The presented solutions address the lack of point‑of‑sale information, scalability, and personalization in AIGC advertising images by (1) building a relation‑aware diffusion model for layout generation, (2) integrating category‑common and personalized style generators into diffusion models, and (3) proposing a planning‑and‑rendering framework (P&R) that jointly optimizes layout and background synthesis.

Future Directions

Future work will focus on improving controllability, enhancing multimodal integration of text, image, and video, and delivering personalized ad creatives tailored to specific user groups.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Advertising AI diffusion layout generation image synthesis

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.