JD Donates Oxygen xLLM Inference Engine to OpenAtom, Boosting China’s AI Infra Ecosystem
On June 24, 2026 JD announced the donation of its Oxygen xLLM large‑model inference engine to the OpenAtom Open Source Foundation, detailing its service‑engine decoupled architecture, performance breakthroughs, heterogeneous chip support, and real‑world gains in e‑commerce, power‑grid and public‑safety applications while outlining a roadmap for broader ecosystem co‑building and standards leadership.
Open‑source release
On 24 June 2026 JD transferred the copyright, patents, trademark and related rights of its self‑developed large‑model inference engine Oxygen xLLM to the OpenAtom Open Source Foundation under the Apache 2.0 license.
Architecture
Oxygen xLLM adopts a service‑engine decoupled design. The xLLM‑Service layer provides unified online/offline task scheduling, dynamic PD (pre‑emptive‑dispatch) separation for traffic spikes, a global KV cache and fast fault recovery to maintain large‑scale production stability. The xLLM‑Engine layer implements multi‑stage pipelines that overlap computation and communication, adaptive graph execution modes, efficient memory management, and optimizations for MoE, speculative decoding and generative‑recommendation workloads.
Hardware abstraction hides differences among GPU, NPU and MLU, allowing mixed deployment of LLM, VLM, DiT, text‑to‑image/video and generative‑recommendation models on various domestic AI chips.
Key technical capabilities
Architecture innovation – first inference framework that separates scheduling (service) from execution (engine), enabling independent evolution of both components.
Performance breakthrough – multi‑level pipelines, adaptive graph mode and dynamic PD separation increase throughput and resource utilization while meeting strict SLO constraints, surpassing existing state‑of‑the‑art inference frameworks.
Heterogeneous unification – a unified inference abstraction layer masks hardware and model differences, supporting multiple model types and mixed deployment of domestic chips.
High‑availability – global KV‑cache management, distributed fast‑recovery, health monitoring and automatic inspection ensure stable operation at production scale.
Domestic‑chip adaptation – a single framework covers a range of Chinese AI chips, filling the gap of “heterogeneous‑chip unified inference”.
Industrial validation
E‑commerce customer‑service : cluster utilization increased by >35 % and P99 latency decreased by 28 % during high‑traffic promotions.
Power‑grid inspection : inspection efficiency improved ~3×, outage rate fell 30 %, and emergency‑repair speed rose 20 %.
Public‑safety edge inference : inspection efficiency grew 227 %, concurrent requests rose 127 %, and time‑to‑first‑token shortened by 50 %.
Community adoption
Since open‑sourcing, the project has received over 1.4 k GitHub stars, 235 forks and contributions from major domestic chip and model vendors.
Roadmap
Planned milestones include full multimodal support (text‑to‑image/video/Omini), comprehensive adaptation of mainstream Chinese chips and a commercial enterprise edition by the end of 2026, followed by an “industry penetration and standard‑leadership” phase in 2027 to make Oxygen xLLM the de‑facto standard for Chinese AI chips.
Repository
https://github.com/jd-opensource/xllm
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Tech Talk
Official JD Tech public account delivering best practices and technology innovation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
