JD Donates Oxygen xLLM Large‑Model Inference Engine to OpenAtom Foundation to Boost Domestic AI Infra

JD donated its self‑developed Oxygen xLLM large‑model inference engine to the OpenAtom Open Source Foundation under Apache 2.0, highlighting its service‑engine decoupled architecture, heterogeneous‑chip support, proven performance gains in e‑commerce, power and public‑safety use cases, and a roadmap to become the domestic AI‑infra standard.

JD Tech
JD Tech
JD Tech
JD Donates Oxygen xLLM Large‑Model Inference Engine to OpenAtom Foundation to Boost Domestic AI Infra

Background and Vision

On 2026‑06‑25 JD donated its self‑developed large‑model inference engine Oxygen xLLM to the OpenAtom Open Source Foundation under the Apache 2.0 license, transferring copyright, related patents, trademark and associated rights.

JD’s AI Infra & Big Data Computing head Zhang Ke described the next stage of AI infrastructure as Engineering Intelligence (EI) : scheduling systems autonomously perceive workload characteristics, inference engines automatically generate optimal execution plans based on model structure and hardware traits, and the entire AI pipeline becomes self‑aware, self‑deciding and self‑optimising.

Problem Space

Production‑grade large‑model deployment faces three core challenges:

SLO compliance and resource efficiency are hard to achieve simultaneously.

Insufficient exploitation of hardware potential.

Coordinating heterogeneous scenarios such as traffic tides, MoE evolution and multi‑model, multi‑chip environments.

Architecture – Service‑Engine Decoupling

Oxygen xLLM is the first framework that separates the Service layer (xLLM‑Service) from the Engine layer (xLLM‑Engine) .

Service Layer (xLLM‑Service)

Unified elastic scheduling for online and offline tasks, balancing SLO guarantees with cluster utilisation.

Dynamic PD (pre‑deployment) separation to absorb traffic spikes.

Global KV cache and rapid fault recovery to ensure large‑scale production availability.

Engine Layer (xLLM‑Engine)

Multi‑level pipelines that fully overlap computation and communication.

Adaptive graph mode with efficient memory management to handle dynamic inputs and GPU memory allocation.

Specialised optimisations for MoE, speculative decoding, generative recommendation and other scenarios, unlocking hardware potential.

The framework natively supports GPU, NPU and MLU . A unified inference abstraction layer masks hardware and model differences, enabling LLM, VLM, DiT, text‑to‑image/video and generative‑recommendation models to run on a mix of domestic chips.

Technical Highlights

Architectural Innovation – Service‑engine decoupling allows independent evolution of scheduling and compute while delivering synergistic performance gains.

Performance Breakthrough – Multi‑level pipelines, adaptive graph mode and dynamic PD separation significantly improve throughput and resource utilisation under strict SLO constraints, surpassing existing state‑of‑the‑art inference frameworks.

Heterogeneous Unification – The abstraction layer supports LLM, VLM, DiT, multimodal generation and recommendation models across multiple domestic chips, filling the “heterogeneous chip unified inference” gap.

High‑Availability Guarantees – Global KV cache management, distributed fast fault recovery, health monitoring and automatic inspection protect stable large‑scale production.

Domestic‑Chip Adaptation – A single framework covers a variety of domestic chips, lowering the barrier for国产化 deployment.

Industrial‑Scale Validation

Real‑world deployments demonstrate concrete gains:

E‑commerce customer‑service models : cluster utilisation increased by >35 %, P99 latency reduced by 28 % during peak traffic.

Power‑inspection workloads : inspection efficiency grew ~3×, outage‑rate fell 30 %, emergency‑repair efficiency improved 20 %.

Public‑safety edge inference : inspection speed rose 227 %, concurrent capacity improved 127 %, time‑to‑first‑token (TTFT) cut by 50 %.

Community Adoption

Since open‑sourcing, the project has attracted >1.4k GitHub stars and 235 forks. Major domestic chip vendors and large‑model providers have become core contributors and sponsors.

Roadmap and Ecosystem Plan

After joining OpenAtom, the project will focus on three areas:

Co‑building an ecosystem with model, chip and cloud partners to create a “chip + framework + solution” stack.

Promoting the proven capabilities to additional industries.

Driving the creation of standards for large‑model inference engines to accelerate domestic adoption.

Planned milestones:

By 2026: full multimodal support (text‑to‑image/video, Omini), comprehensive adaptation to mainstream domestic chips, launch of an enterprise‑grade commercial service, and expansion of the contributor community to ~200 developers.

From 2027 onward: deep industry penetration and establishment of Oxygen xLLM as the de‑facto standard for domestic‑chip model inference.

Repository

GitHub: https://github.com/jd-opensource/xllm

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

large model inferenceAI Infrastructuredomestic AIOxygen xLLMOpenAtomservice-engine decoupling
JD Tech
Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.