Getting Started with Huawei Ascend AI Accelerators
This guide walks through the fundamentals of Huawei Ascend NPU hardware, the CANN software stack, driver and firmware installation, Kubernetes integration via Docker runtime and device plugin, and a complete ResNet‑50 inference demo on Ascend 310P.
Overview
Huawei Ascend is an AI‑focused NPU series developed by HiSilicon. The Ascend 310 targets edge workloads with up to 16 TOPS (INT8) or 8 TOPS (FP16), while the Ascend 910 delivers up to 320 TFLOPS (FP16) or 640 TOPS (INT8). Both chips are built on advanced process nodes (12 nm for 310, N7+ for 910) and consume 8 W and 310 W respectively.
Software Stack (CANN)
The Compute Architecture for Neural Networks (CANN) is Huawei’s heterogeneous AI computing framework, analogous to Nvidia’s CUDA. It provides drivers, firmware, and a set of libraries that expose AI frameworks to the Ascend processors. The community edition can be downloaded freely for non‑commercial use; a commercial edition is required for production deployments and must be obtained through an application process.
Kubernetes Integration
To run Ascend NPU workloads on Kubernetes, the following steps are required:
Verify the NPU hardware with lspci or npu‑smi info.
Install the driver first, then the firmware (or reverse order for a full reinstall).
Download the appropriate driver ( A300t‑9000‑npu‑driver_*.run) and firmware ( A300t‑9000‑npu‑firmware_*.run) packages from the community download page.
Install the Ascend Docker Runtime:
$ wget -c https://mindx.obs.cn-south-1.myhuaweicloud.com/OpenSource/MindX/MindX%205.0.RC2/MindX%20DL%205.0.RC2/Ascend-docker-runtime_5.0.RC2_linux-x86_64.run
$ chmod u+x Ascend-docker-runtime_5.0.RC2_linux-x86_64.run
$ ./Ascend-docker-runtime_5.0.RC2_linux-x86_64.run --installConfigure containerd to use the Ascend runtime by editing /etc/containerd/config.toml and setting the runtime path to
/usr/local/Ascend/Ascend-Docker-Runtime/ascend-docker-runtime.
Deploy the device plugin (MindX DL) either by building the image from source:
$ wget -c https://mindx.obs.cn-south-1.myhuaweicloud.com/OpenSource/MindX/MindX%205.0.RC2/MindX%20DL%205.0.RC2.1/Ascend-mindxdl-device-plugin_5.0.RC2.1_linux-x86_64.zip
$ docker build -t ascend‑k8sdeviceplugin:v5.0.RC2 .or by pulling the pre‑built image (requires an enterprise AscendHub account):
$ docker pull ascendhub.huawei.com/public-ascendhub/ascend‑k8sdeviceplugin:v5.0.RC2Apply the plugin manifest:
$ kubectl apply -f device-plugin-310-v5.0.RC2.yaml
$ kubectl label nodes {node-name} accelerator=huawei‑Ascend310If the NPU is not detected in certain virtualized environments, the article suggests two fixes: mounting dmidecode or systemd‑detect‑virt into the plugin container, or adding apt‑get install -y systemd to the plugin Dockerfile.
Demo: ResNet‑50 Inference on Ascend 310P
Download the sample code from the Ascend GitHub repository, fetch the Caffe model files, and convert them to an Ascend‑compatible .om model using the ATC tool:
$ atc --model=caffe_model/resnet50.prototxt \
--weight=caffe_model/resnet50.caffemodel \
--framework=0 \
--output=model/resnet50 \
--soc_version=Ascend310P3 \
--input_format=NCHW \
--input_fp16_nodes=data \
--output_type=FP32 \
--out_nodes=prob:0The conversion produces resnet50.om. Deploy a Kubernetes Deployment that requests one Ascend310P device:
kind: Deployment
apiVersion: apps/v1
metadata:
name: ascend-test
spec:
replicas: 1
selector:
matchLabels:
app: ascend-test
template:
metadata:
labels:
app: ascend-test
spec:
containers:
- name: container-1
image: ascendhub.huawei.com/public-ascendhub/ascend-pytorch:23.0.RC1-centos7.6
command: ["top", "-b"]
resources:
limits:
cpu: "2"
huawei.com/Ascend310P: "1"
memory: 16Gi
requests:
cpu: "2"
huawei.com/Ascend310P: "1"
memory: 16GiInside the pod, run the inference script: $ python3 ./src/acl_net.py The output shows a 76 % confidence for class 161 (basset).
CCE (Cloud Container Engine) Usage
After creating a CCE cluster, install the CCE AI suite from the plugin marketplace, add NPU nodes, and configure NPU quotas in the workload definition. The console displays monitoring metrics such as compute utilization, memory usage, and memory occupancy for each node.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Infra Learning Club
Infra Learning Club shares study notes, cutting-edge technology, and career discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
