Artificial Intelligence 10 min read

How to Seamlessly Migrate AI Workloads from Nvidia GPUs to Domestic Accelerators

This article explains why migrating AI applications from Nvidia GPUs to domestic Chinese accelerators is urgent, outlines the technical challenges, and presents JD Cloud's JoyScale zero‑perception migration stack with hardware, software, model, and inference optimizations for real‑world scenarios.

JD Cloud Developers

Sep 11, 2025

How to Seamlessly Migrate AI Workloads from Nvidia GPUs to Domestic Accelerators

In the digital age, AI technology is reshaping life and work, but its rapid deployment depends on powerful compute, traditionally provided by Nvidia GPUs. International restrictions and growing competition make reliance on a single vendor untenable, prompting a shift to domestic GPUs to ensure security and sustainability of China’s AI industry.

1. Urgency and Necessity of Migration

(1) International Challenges

Recent U.S. export bans on high‑end AI chips have severely impacted China’s AI sector. On December 3, 2024, four Chinese semiconductor associations called for cautious procurement of U.S. chips and expanded cooperation with other regions, highlighting the urgent need for autonomous AI chip capabilities.

(2) Need for Technological Self‑Reliance

Dependence on imported chips brings supply risks, potential technical lock‑outs, and security threats. The rise of domestic AI chips offers new options; migrating AI workloads to domestic GPUs reduces foreign reliance, ensures autonomous control, and protects national information security.

(3) Domestic Market Potential

China’s AI market is vast, covering smart security, autonomous driving, medical imaging, and fintech. Domestic GPUs are continuously improving and can now replace imported chips, meeting diverse market demands while providing a large ecosystem for domestic chip development.

2. What Makes Migration Difficult?

The core difficulty lies in the lack of an end‑to‑end migration toolchain and solution built for domestic GPUs, which would allow algorithm engineers to switch compute without code changes.

3. JoyScale “Zero‑Perception” Migration Stack

JD Cloud’s JoyScale heterogeneous compute management platform, refined on millions of cards, has completed migration for over 40 mainstream models and distilled a full‑stack solution with four core principles:

Zero intrusion: algorithms run unchanged; migration is achieved by backend switching.

Verifiable: each step has a GPU baseline for quantifiable, rollback‑able error.

Scalable: new chips are added via a plug‑in approach without altering the core framework.

Full‑link: end‑to‑end coverage from training, fine‑tuning, inference to online monitoring.

3.1 System Architecture

3.2 Migration Solution

Hardware Adaptation

Accelerator Scheduling Adaptation – develop scheduling plugins for domestic inter‑card communication (e.g., Ascend 910B HCCS requires pods to be in the same HCCS ring).

Operator Support Analysis – use tools like PyTorch Profiler to compare GPU operators with the API list of domestic GPUs and develop missing operators.

Performance Tuning – profile each operator, optimize slow ones by aligning data, using continuous memory, or fusing operators via vendor APIs.

Software Adaptation

Program Migration – replace torch.cuda.xxx() calls with torch.npu.xxx() without altering algorithm code.

Framework Optimization – provide a unified API that lets NPU and GPU users switch training backends seamlessly and at zero cost.

Model Adaptation

Model Quantization – reduce compute and storage requirements to improve efficiency on domestic GPUs.

Soft‑Hardware Co‑Optimization – use Triton compilation, CANN fusion, and techniques such as flash attention, rotary embedding, and sub‑graph scheduling to achieve high MFU and near‑linear scaling for trillion‑parameter models.

Inference Optimization

Apply GE graph compilation and ATB high‑performance operators to deep‑optimize operations like Paged Attention and Flash Attention, supporting W8A8 SmoothQuant and W4A16 AWQ quantization.

Deploy dual‑backend hot‑standby serving, gradually ramping traffic to domestic compute with automatic rollback to Nvidia GPUs if failure rate exceeds 0.1%.

Unified Scheduling and Monitoring

Develop a cloud‑native, ten‑thousand‑card heterogeneous scheduling system that auto‑detects CPU NUMA and network topology, applying gang scheduling and resource pooling for maximal efficiency.

Provide visual monitoring of GPU/NPU utilization, memory usage, service throughput, failure rates, latency, and token counts.

4. Typical Deployment Scenarios

Retail – multimodal models analyze product videos, extracting tags; migration to domestic NPU yields comparable accuracy and similar response latency.

Intelligent Customer Service – large‑model agents fine‑tuned on domestic compute produce analysis results identical to Nvidia‑based models, with 96% of issues routed to the same downstream paths.

Logistics – address parsing models fine‑tuned on domestic hardware achieve 91.03% accuracy versus 91.08% on Nvidia, enabling AI pre‑sorting that processes over 30,000 abnormal addresses daily across multiple provinces.

5. Conclusion

Migrating AI applications from Nvidia GPUs to domestic GPUs is not optional but essential for the security and sustainable growth of China’s AI industry. JD Cloud’s JoyScale offers a mature, complete migration stack that lowers cost, speeds up migration, and ensures high‑performance operation on domestic accelerators, allowing customers to focus on algorithmic innovation.

heterogeneous computing model quantization AI migration domestic GPUs JoyScale

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.