How JD Ads Uses Large Language Models to Transform Advertising

This article details JD Advertising's shift from generic to domain‑specific large models, the design of AI‑driven ad agents, the end‑to‑end GRAM retrieval‑alignment system, CTR‑guided AIGC for creatives, ultra‑low‑latency inference techniques, and ARM‑based optimizations that together reshape modern ad marketing.

JD Tech
JD Tech
JD Tech
How JD Ads Uses Large Language Models to Transform Advertising

Background

Large language models (LLMs) have become a core technology for e‑commerce advertising. JD Advertising built a vertically‑specialized LLM ecosystem that covers the entire ad‑marketing workflow, from intent understanding to real‑time serving.

Technical Contributions

Vertical Model Shift – General‑purpose LLMs lack the depth for advertising tasks. JD trained a domain‑specific model that encodes marketing knowledge, seasonal trends, and continuous decision‑making, enabling the system to move from “knowing” to “executing”.

Advertising Intelligent Agents – A multi‑agent architecture (editing, placement, optimization, creative generation) orchestrated via the ReAct framework, an A2A communication protocol, and short‑term/long‑term memory management. A single natural‑language command (e.g., “increase budget of ROI≥2 plans by 20%”) triggers the agents to adjust parameters, allocate budget, and perform real‑time optimization.

GRAM: Generative Retrieval and Alignment Model – An end‑to‑end LLM‑driven pipeline that (1) generates a unified code for user queries ( Query‑code generator) and product titles ( Product‑code generator), (2) aligns query and product codes using dense and sparse retrieval, (3) performs pre‑ranking and fine‑ranking, and (4) continuously updates the product‑code library with near‑line feedback (clicks, purchases). The full pipeline runs within 30‑50 ms P99 latency.

CTR‑Guided AIGC for Images – A reinforcement‑learning (RL) based image generator that receives an instruction prompt and is optimized by a reward model trained on advertising click‑through‑rate (CTR) data. The reward model captures visual‑aesthetic factors (color, layout) and commercial signals (product positioning) that correlate with higher CTR.

CTR‑Guided AIGC for Videos – A multi‑strategy video agent that (1) plans a video script based on the request and a material library, (2) generates multiple drafts (storyboard, digital‑human narration, image‑to‑video conversion, voice‑over, BGM), (3) evaluates drafts with a visual‑language model and online CTR feedback, and (4) selects the best draft for direct deployment.

Ultra‑Low‑Latency Inference – Three techniques achieve sub‑50 ms P99 latency: (a) PD mixed scheduling with asynchronous operators, (b) model‑head pruning and logits masking to reduce token search space (throughput ↑ 70 %), and (c) heterogeneous CPU‑xPU pools with secure execution to guarantee fast, safe processing.

ARM‑Optimized Service – A three‑level KV‑Cache (SSD → SRAM → DRAM → HBM) accelerates memory access, custom KDNN operators on Kunpeng CPUs provide >30 % speedup over open‑source kernels for MatMul/Conv, and graph‑level compilation with auto‑tuning yields an additional >10 % performance gain on domestic ARM chips.

Advertising Intelligent Agents Architecture

The system contains hundreds of agents; four core agents handle editing, placement, optimization, and creative generation. Coordination relies on:

ReAct framework – Enables agents to reason over data and act on the platform.

A2A protocol – Standardized message format for real‑time state sharing.

Memory management – Short‑term memory stores current commands; long‑term memory retains historical campaign performance and seasonal patterns.

Model adaptation uses supervised fine‑tuning (SFT) for advertising scenarios and RLHF to align outputs with advertiser preferences.

GRAM Model Details

Workflow:

Query‑code generation : LLM converts the user query (e.g., “lightweight sunscreen”) into a dense code.

Product‑code generation : Product titles are encoded into matching codes.

Alignment : Sparse and dense retrieval match query‑code to product‑code.

Pre‑ranking & ranking : Two‑stage ranking produces the final list.

Near‑line feedback loop : Click and purchase signals continuously refine the product‑code embeddings.

The entire pipeline is compiled into a single inference graph, allowing 30‑50 ms P99 latency on production traffic.

CTR‑Driven Creative Generation

Image generation uses an instruction prompt (e.g., “generate a sunscreen ad emphasizing SPF 50+ for sensitive skin”) and a reward model trained on billions of CTR records. The LLM iteratively refines the image until the reward score meets a threshold.

Video generation follows a three‑stage process: planning, multi‑strategy generation, and evaluation‑optimization. The evaluation model combines visual‑language quality metrics with online CTR signals to select the optimal draft.

Low‑Latency Inference Techniques

Key optimizations:

PD mixed scheduling – Parallel‑pipeline dispatch reduces queuing delays.

Asynchronous operators – Overlap I/O and compute.

Logits masking & head pruning – Limits the token search space, improving throughput.

CPU‑xPU heterogeneous pool – Securely partitions workloads across CPUs and accelerators.

ARM‑Centric Optimizations

Three‑level KV‑Cache hierarchy accelerates token cache look‑ups. Custom KDNN kernels on Kunpeng CPUs achieve >30 % speedup for matrix multiplication and convolution compared with oneDNN. Graph‑level compilation with auto‑tuning adds another >10 % gain, enabling real‑time serving on domestic ARM silicon.

Conclusion

By transitioning from general to vertical LLMs, integrating a multi‑agent intelligent‑agent framework, deploying the GRAM retrieval‑alignment pipeline, and engineering ultra‑low‑latency ARM‑optimized inference, JD Advertising demonstrates that large language models can deliver end‑to‑end, real‑time advertising solutions that reduce manual effort, improve click‑through performance, and scale securely across heterogeneous compute resources.

JD advertising presentation
JD advertising presentation
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

large language modelsIntelligent Agentsadvertising AICTR optimizationlow latency inferenceretrieval alignment
JD Tech
Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.