Artificial Intelligence 20 min read

Generative Large‑Model Architecture for JD Advertising: Practices, Challenges, and Optimization

JD’s advertising platform replaces rule‑based recall with a generative large‑model pipeline that unifies e‑commerce knowledge, multimodal user intent, and semantic IDs across recall, coarse‑ranking, fine‑ranking and creative optimization, while meeting sub‑100 ms latency and sub‑¥1‑per‑million‑token cost through quantization, parallelism, caching, and joint generative‑discriminative inference, delivering double‑digit performance gains and paving the way for domain‑specific foundation models.

JD Retail Technology

Apr 22, 2025

Generative Large‑Model Architecture for JD Advertising: Practices, Challenges, and Optimization

Overview

In JD’s advertising platform, the retrieval (recall) stage is critical. Traditional rule‑based recall lacks flexibility and fails to capture diverse user needs, while large generative models offer new opportunities but introduce training cost and privacy challenges.

Key Points of the Talk

At the AICon Global AI Development & Application Conference, JD’s Algorithm Director Zhang Zehua presented “JD Advertising Large‑Model Application Architecture Practice”, sharing solutions and lessons for applying large models in advertising.

Generative Retrieval System

The system integrates world knowledge, JD’s e‑commerce data, multimodal product understanding, and user‑intent recognition, coupled with efficient model training and inference pipelines. By quantifying product semantics, performing generative decoding for recall, and optimizing inference performance, recall efficiency is significantly improved.

Three‑Stage Pipeline

From a classic advertising workflow, generative techniques are applied in three stages:

Recall & coarse‑ranking – an information‑retrieval problem that “creates” candidate items from massive data.

Fine‑ranking – CTR/CVR models filter and rank candidates.

Information‑completion – multimodal understanding and re‑ranking (creative optimization) refine top results.

Data Representation

Semantic ID is introduced as a unified representation for user behavior and e‑commerce knowledge, enabling the model to understand both structured (product attributes) and unstructured (user‑generated images, comments) data.

Engineering Challenges

Two major challenges dominate industrial deployment:

Low latency: inference must stay below ~100 ms, otherwise the result is discarded.

High throughput & cost control: a million‑token inference should cost less than ¥1, otherwise large‑model deployment is not viable.

Optimization Layers

Optimization is tackled at three levels:

Single‑node: quantization (FP8, lower‑bit), tensor parallelism, advanced attention (Flash/Page Attention), and dynamic latency batching.

Distributed: soft‑hardware co‑design, KV‑Cache pooling, model graph splitting, multi‑level CPU‑RAM/GPU‑HBM caching.

Full‑link: edge pre‑computation, near‑line and offline computation to steal latency from the critical path.

Joint Generative & Discriminative Inference

JD rewrites the generative inference flow in TensorFlow, integrates it with traditional sparse CTR/CVR models, and shares hidden states between the two, achieving a unified inference graph that avoids HBM bottlenecks.

Results & Outlook

The generative approach has been applied across recall, coarse‑ranking, fine‑ranking, creative bidding, and re‑ranking, delivering double‑digit performance gains. Future directions include domain‑specific foundation models for e‑commerce, deeper fusion of generative and discriminative models, and continued co‑design of algorithms and systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems Advertising recommendation Inference Optimization Generative AI

Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.