Deploying DeepSeek R1 on Huawei Ascend 910B: Weight Conversion and Troubleshooting
This article details a step‑by‑step deployment of the DeepSeek R1 model on Huawei Ascend 910B NPUs, covering FP8‑to‑BF16 weight conversion, custom container image preparation, configuration of MindIE services, common pitfalls, and practical troubleshooting tips for large‑scale inference.
Overview
The DeepSeek R1 model, released with FP8 weights and a total size of 671 B parameters, cannot run directly on Huawei Ascend 910B NPUs because the hardware lacks FP8 support. The weights must be converted to BF16, which expands the memory footprint to roughly 1.4 TB, requiring two machines with a total of 32 Ascend 910B cards.
Overall Plan
Download the R1 model files.
Convert FP8 weights to BF16 using a provided script.
Build or pull a compatible MindIE container image (ARM‑based images are unsuitable for x86 hosts).
Configure config.json for multi‑node inference.
Start the mindie‑service and verify successful inference.
Weight Conversion
The official DeepSeek‑V3 repository does not contain conversion code, but the required script is available at:
https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/fp8_cast_bf16.py
The script must run on hardware that supports FP8 (e.g., recent NVIDIA GPUs). When only BF16‑capable NPUs are available, the conversion must be performed on a separate machine before deployment.
Container Image and Environment
The deployment documentation provides a container image that appears to target the ARM architecture. Using it on x86 hosts leads to failures, so the author switched to a manually built image with the necessary MindIE software and six proprietary POC packages supplied by Huawei.
Configuration Files
rank_table_file : Must be generated programmatically; fields such as server_count should remain strings, not integers. The server_id field can accept either host or container IP addresses.
config.json : Based on the example in the MindIE documentation, enable multi‑node inference, set ipAddress and managementIpAddress correctly, and adjust npuDeviceIds and worldSize for single‑node tests (these are ignored in multi‑node mode).
Troubleshooting
The mindie‑service often fails silently, producing no logs in the configured log file. Common causes include malformed rank_table_file, missing NPU network connectivity, or absent Python packages. The hidden log directory $HOME/mindie contains useful output that revealed a missing pip package as the root cause.
Open Issues
Transferring the 1.4 TB model over a 2 GB/s link still takes many minutes, and loading the model inside MindIE can exceed one hour; Huawei suggests setting export OMP_NUM_THREADS=1 to reduce load time to about 10 minutes.
Further optimization of model loading and container startup remains an open area for improvement.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
