Artificial Intelligence 6 min read

How JD Retail’s xLLM Architecture Revolutionizes AI Inference for E‑Commerce

At GAITC2025, JD Retail’s AI Infra lead Zhang Ke detailed the challenges of e‑commerce AI inference and introduced the xLLM edge‑cloud unified large‑model architecture, highlighting adaptive scheduling, offline unified scheduling, multi‑layer pipelines, and agent collaboration that boost performance, cut costs, and pave the way for future AI advancements.

JD Cloud Developers

Jun 24, 2025

How JD Retail’s xLLM Architecture Revolutionizes AI Inference for E‑Commerce

AI Inference Challenges in E‑Commerce

Large‑model technology is rapidly advancing, becoming the new foundation for industrial intelligence and moving AI from “usable” to “useful, controllable, trustworthy.” In e‑commerce, three main demand directions emerge: Generative AI (e.g., product image, short video, marketing content, digital humans), Agentic AI (e.g., AI customer service, AI operation management, warehouse‑logistics optimization, interactive recommendation), and Physical AI (e.g., sorting robots, smart spaces, autonomous driving). These diverse scenarios create technical challenges for AI inference, such as varied input types, differing user‑priority, task allocation between edge (mobile) and cloud, collaborative optimization, model compression, and performance tuning.

JD Retail and Tsinghua University Launch xLLM Edge‑Cloud Unified Large‑Model Inference Architecture

Since 2022 JD has cooperated with Tsinghua University on computer vision, machine learning, recommendation systems, and big data. This year they expanded to large‑model inference engine localization and multimodal recommendation models, combining technical resources and academic strengths to turn research into production.

Joint work on model quantization, edge‑cloud collaborative inference, and the “xLLM” architecture addresses performance optimization, enabling large‑model deployment at scale in complex e‑commerce environments. Edge‑cloud joint deployment achieves efficient collaborative inference; cloud‑side models are continuously refined using user feedback, while lightweight edge models are updated in real time, forming a closed‑loop evolution system that adapts to device resource constraints.

The xLLM architecture explores four technical aspects:

Adaptive scheduling optimization: Dynamically adjust the ratio of Prefill and Decode nodes to provide elastic PD capability.

Offline unified scheduling: Real‑time load‑aware scheduling of offline requests, enabling request‑level mixed‑offline deployment.

Multi‑layer pipeline execution: Maximize resource utilization through asynchronous pipelines for model execution, layer‑wise computation and communication, and parallel memory access.

Edge‑cloud Agent collaboration: Edge agents handle simple tasks and privacy‑sensitive data, while cloud agents continuously feedback improvements, enhancing edge capabilities via an efficient agent protocol.

Deployed internally, the architecture has significantly accelerated response speed and reduced compute cost in interactive shopping guides, product comparison, summarization, and recommendation, saving up to 70 % of inference cost while improving model understanding and user engagement.

Future Directions for AI Inference

JD plans to increase investment in domestic AI infrastructure, building an autonomous, controllable ecosystem. The focus will be on breaking the “impossible triangle” of scale, efficiency, and cost; advancing asynchronous multi‑agent self‑evolution across edge and cloud; and enhancing explainability and debuggability of distributed inference.

Through deep adaptation of mainstream domestic chips and a self‑developed inference framework, JD aims to reach international‑level efficiency and collaborate with industry peers to push technical boundaries.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

e-commerce Model Optimization AI inference large model edge cloud xLLM

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.