Artificial Intelligence 8 min read

JD Donates Oxygen xLLM: Open‑Source Large‑Model Inference Engine Boosts China’s AI Infrastructure

JD announced the donation of its Oxygen xLLM inference engine to the OpenAtom Open‑Source Foundation, detailing its service‑engine decoupled architecture, performance breakthroughs across e‑commerce, power and public‑safety workloads, and a roadmap to expand the open‑source AI ecosystem.

JD Cloud Developers

Jun 25, 2026

JD Donates Oxygen xLLM: Open‑Source Large‑Model Inference Engine Boosts China’s AI Infrastructure

Open‑source donation

On 24 June 2026 JD transferred the Oxygen xLLM large‑model inference engine—including copyrights, patents, trademarks and related rights—to the OpenAtom Open‑Source Foundation under the Apache 2.0 license.

Engineering‑Intelligence vision

The next stage of AI infrastructure is described as Engineering Intelligence (EI): a stack that can sense workload characteristics, generate optimal execution plans automatically, and perform self‑aware scheduling and end‑to‑end self‑optimisation.

Architecture

Oxygen xLLM adopts a service‑engine decoupling architecture.

Service layer (xLLM‑Service) : unified elastic scheduling for online and offline tasks, dynamic PD (parameter‑distribution) separation to handle traffic spikes, global KV cache and fast fault recovery for large‑scale production stability.

Engine layer (xLLM‑Engine) : multi‑level pipelines that overlap compute and communication, adaptive graph mode and efficient memory management to handle dynamic inputs and GPU memory allocation, specialised optimisations for MoE, speculative decoding and generative‑recommendation scenarios.

Hardware and access

Provides a unified AI Gateway and an OpenAI‑compatible SDK. Native execution is supported on GPU, NPU and MLU, covering a wide range of domestic AI chips.

Technical highlights

Architectural innovation – service‑engine decoupling enables independent evolution of scheduling and computation.

Performance breakthrough – multi‑level pipelines, adaptive graph mode and dynamic PD separation significantly improve throughput and resource utilisation while meeting strict SLO constraints, surpassing existing state‑of‑the‑art inference frameworks.

Heterogeneous unification – a unified inference abstraction masks hardware and model differences, supporting LLM, VLM, DiT, text‑to‑image/video and generative‑recommendation models on mixed domestic chips.

High‑availability guarantees – global KV cache management, distributed fast‑fail recovery, health monitoring and automatic inspection ensure stable large‑scale production.

Domestic adaptation – a single framework covers multiple domestic chips, filling the gap of “heterogeneous chip unified inference” and lowering deployment barriers.

Industrial validation

In JD e‑commerce customer‑service models, cluster utilisation increased by more than 35 % and P99 latency decreased by 28 %.

In power‑inspection scenarios, efficiency improved three‑fold, outage rates fell by 30 %, and emergency‑repair speed increased by 20 %.

In public‑safety edge inference, inspection efficiency grew by 227 %, concurrency rose by 127 %, and time‑to‑first‑trace was cut by 50 %.

Community adoption

The project’s GitHub repository https://github.com/jd-opensource/xllm has attracted over 1.4 k stars, 235 forks, and participation from major domestic chip and model vendors.

Roadmap

Planned milestones for 2026 include full multimodal support (text‑to‑image, video, Omni), comprehensive adaptation of mainstream domestic chips, and the launch of commercial enterprise services. By 2027 the contributor base is expected to reach around 200, with a focus on industry penetration and establishing Oxygen xLLM as a de‑facto standard for domestic chip inference.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance optimization large model inference open source AI Infrastructure heterogeneous computing engineering intelligence Oxygen xLLM

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.