How JD Retail’s xLLM Architecture Revolutionizes AI Inference for E‑Commerce
The article details JD Retail’s collaboration with Tsinghua University to build the xLLM edge‑cloud unified large‑model inference framework, addressing e‑commerce AI challenges such as diverse inputs, task scheduling, model compression, and cost, while outlining future research directions and performance gains.
Challenges of AI Inference in E‑Commerce
Large‑model technology is rapidly advancing and becoming the new foundation for industrial intelligence, pushing AI from "usable" to "useful, controllable, and trustworthy." In e‑commerce, three main demand directions emerge: Generative AI (e.g., product image generation, short videos, marketing content, digital humans), Agentic AI (e.g., AI customer service, operation management, warehouse optimization, interactive recommendation), and Physical AI (e.g., sorting robots, smart spaces, autonomous driving). These diverse scenarios create technical challenges for AI inference, including varied input types, differing user priority, task allocation between edge devices and cloud servers, collaborative optimization, and model compression and performance tuning.
JD Retail & Tsinghua Launch xLLM Edge‑Cloud Unified Large‑Model Inference Architecture
Since 2022, JD Retail and Tsinghua University have cooperated on computer vision, machine learning, recommendation systems, and big data, launching over ten joint research projects. This year they expanded to frontier topics such as domestic large‑model inference engine localization and multimodal recommendation models, aiming to integrate technical resources and academic strengths to translate research into production.
The partnership introduced the "xLLM" edge‑cloud unified inference architecture, tackling performance optimization for inference engines and enabling large‑model deployment at scale in complex e‑commerce environments. By jointly deploying on edge and cloud, the system achieves efficient collaborative inference, continuously refines cloud models with user feedback, and updates lightweight edge models in real time, forming a closed‑loop evolution that boosts performance in real scenarios. The architecture also adapts to varying device resource constraints, allowing broader reuse of large models.
The xLLM architecture explores four technical aspects:
Adaptive scheduling optimization: Dynamically adjusts the ratio of Prefill and Decode nodes to provide elastic PD capabilities.
Unified offline scheduling: Real‑time load‑aware scheduling of offline requests, enabling request‑level mixed‑mode processing.
Multi‑layer pipeline execution: Maximizes resource utilization through asynchronous pipelines across layers, compute units, and memory accesses.
Edge‑cloud Agent collaboration: Edge agents handle simple tasks and privacy‑sensitive data, while cloud agents continuously feedback to improve edge capabilities via an efficient agent protocol.
Deployed internally across scenarios such as interactive shopping assistance, product comparison, summarization, and recommendation, the architecture significantly speeds responses, reduces compute costs, and enhances user engagement. In core product understanding, it improves model comprehension and can cut inference costs by up to 70%.
Future Thoughts on AI Inference
Building on current research, JD Retail will further invest in domestic AI infrastructure to create an autonomous, controllable ecosystem. Future focus includes breaking the "impossible triangle" of scale, efficiency, and cost; advancing asynchronous multi‑agent evolution across edge and cloud; and improving explainability and debuggability of distributed inference.
The goal is deep adaptation of mainstream domestic chips and development of a self‑developed large‑model inference framework that reaches international performance standards, while collaborating with industry peers to push technical boundaries.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
