Artificial Intelligence 13 min read

Evolution and Optimization of JD Retail Advertising Online Model System: From Deep Learning to Distributed Graph Computing and Power Collaboration

The article details JD Retail Advertising's three‑stage evolution of its online model system—deep‑learning era, large‑model era, and power‑collaboration era—highlighting heterogeneous computing optimizations, platform and system capabilities, distributed graph computing, online learning, and dynamic power allocation to dramatically improve algorithm iteration speed and model performance.

JD Retail Technology
JD Retail Technology
JD Retail Technology
Evolution and Optimization of JD Retail Advertising Online Model System: From Deep Learning to Distributed Graph Computing and Power Collaboration

As e‑commerce expands, JD Retail Advertising's technology team faces higher demands on iteration efficiency, parameter scale, and compute power, prompting deep optimization of heterogeneous computing frameworks.

1. Current Status

Algorithm strategies are crucial for understanding user behavior and optimizing ad delivery. The model system supports search, recommendation, focus, and external ads, forming the backbone of deep‑learning modeling across the ad chain.

Platform Capabilities

High throughput and concurrency: hundreds of billions of PV, millions of QPS.

Low latency, high reliability: trillions of estimations per second, millisecond latency, 99.99% availability.

Cluster scale: over 10,000 nodes.

Iteration cycle: three releases per day.

System Capabilities

TB‑level model estimation.

CPU/GPU heterogeneous computing.

Minute‑level online learning.

Real‑time graph computation with billions of nodes and edges.

Hierarchical compute: real‑time, offline, near‑line.

2. Development Stages

2.1 Deep‑Learning Era – Unified Architecture & Faster Iteration

Early systems lacked unified architecture, leading to high business integration cost, fragmented features, and tight algorithm‑engine coupling. The solution introduced three core modules:

Model Access Service : traffic control and decoupling from business logic.

Feature Compute Service : unified data management and feature extraction.

Model Inference Service : unified inference, updates, and logging.

Key functions of model access include dynamic traffic routing, parallel traffic distribution, and traffic protection.

2.2 Large‑Model Era – Scaling Model Size & Timeliness

Introducing massive models (Transformer‑based) raised complexity, parameter scale (from billions to hundreds of billions), and reduced timeliness. A distributed graph‑computing architecture was built to boost compute power, support larger models, and enable minute‑level online learning.

Highlights of the distributed graph architecture:

Computation layering based on compute‑intensive vs. I/O‑intensive tasks.

Storage layering for sparse and dense parameters, leveraging CPU/GPU strengths.

Integrated online‑offline compute for higher efficiency.

Results: 10× online compute boost, 1× offline training boost; model parameter scale grew 8×, search CTR +3%, recommendation CTR +8%.

Online Learning

Incremental parameter updates shrink update interval from days to minutes.

High‑availability architecture with fast rollback.

Impact: minute‑level streaming training, search CTR +10.47% during Double‑11 promotion.

Graph Computing

Compute‑storage separation, clear extensible design.

Distributed cluster storage with millisecond latency.

Support for billion‑node, hundred‑billion‑edge dynamic graphs.

Impact: billion‑node graph storage, recommendation CTR +3%, homepage recall CTR +2%.

3. Power‑Collaboration Era – Fine‑Grained Compute Coordination

Horizontal acceleration via distributed graph and GPU hardware reached limits; vertical full‑link power coordination was needed.

New industrial‑grade deep‑learning architecture introduces:

Distributed graph as the base layer.

Integrated online‑offline compute with hardware‑software co‑design.

Hierarchical power layers: real‑time, near‑line, offline.

Key components:

Pre‑Compute : expands real‑time behavior modeling from hundreds to thousands of events without latency increase, boosting search CTR +6%.

Near‑Line Compute : enables full‑library retrieval with deep models, raising recommendation CTR +3%.

Dynamic Power Allocation : multi‑objective load‑balancing reduces resource imbalance, optimizing 9,000C+ machine resources.

Future plans include dynamic power distribution based on traffic value and user segmentation to maximize limited resource returns.

Overall, the three‑stage evolution has dramatically improved iteration speed, model scale, and compute efficiency, supporting rapid development across multiple business lines.

advertisingAIlarge modelsheterogeneous computingdistributed graphonline modelingpower collaboration
JD Retail Technology
Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.