Getting Started with Huawei Ascend AI Accelerators

This guide walks through the fundamentals of Huawei Ascend NPU hardware, the CANN software stack, driver and firmware installation, Kubernetes integration via Docker runtime and device plugin, and a complete ResNet‑50 inference demo on Ascend 310P.

Infra Learning Club
Infra Learning Club
Infra Learning Club
Getting Started with Huawei Ascend AI Accelerators

Overview

Huawei Ascend is an AI‑focused NPU series developed by HiSilicon. The Ascend 310 targets edge workloads with up to 16 TOPS (INT8) or 8 TOPS (FP16), while the Ascend 910 delivers up to 320 TFLOPS (FP16) or 640 TOPS (INT8). Both chips are built on advanced process nodes (12 nm for 310, N7+ for 910) and consume 8 W and 310 W respectively.

Software Stack (CANN)

The Compute Architecture for Neural Networks (CANN) is Huawei’s heterogeneous AI computing framework, analogous to Nvidia’s CUDA. It provides drivers, firmware, and a set of libraries that expose AI frameworks to the Ascend processors. The community edition can be downloaded freely for non‑commercial use; a commercial edition is required for production deployments and must be obtained through an application process.

Kubernetes Integration

To run Ascend NPU workloads on Kubernetes, the following steps are required:

Verify the NPU hardware with lspci or npu‑smi info.

Install the driver first, then the firmware (or reverse order for a full reinstall).

Download the appropriate driver ( A300t‑9000‑npu‑driver_*.run) and firmware ( A300t‑9000‑npu‑firmware_*.run) packages from the community download page.

Install the Ascend Docker Runtime:

$ wget -c https://mindx.obs.cn-south-1.myhuaweicloud.com/OpenSource/MindX/MindX%205.0.RC2/MindX%20DL%205.0.RC2/Ascend-docker-runtime_5.0.RC2_linux-x86_64.run
$ chmod u+x Ascend-docker-runtime_5.0.RC2_linux-x86_64.run
$ ./Ascend-docker-runtime_5.0.RC2_linux-x86_64.run --install

Configure containerd to use the Ascend runtime by editing /etc/containerd/config.toml and setting the runtime path to

/usr/local/Ascend/Ascend-Docker-Runtime/ascend-docker-runtime

.

Deploy the device plugin (MindX DL) either by building the image from source:

$ wget -c https://mindx.obs.cn-south-1.myhuaweicloud.com/OpenSource/MindX/MindX%205.0.RC2/MindX%20DL%205.0.RC2.1/Ascend-mindxdl-device-plugin_5.0.RC2.1_linux-x86_64.zip
$ docker build -t ascend‑k8sdeviceplugin:v5.0.RC2 .

or by pulling the pre‑built image (requires an enterprise AscendHub account):

$ docker pull ascendhub.huawei.com/public-ascendhub/ascend‑k8sdeviceplugin:v5.0.RC2

Apply the plugin manifest:

$ kubectl apply -f device-plugin-310-v5.0.RC2.yaml
$ kubectl label nodes {node-name} accelerator=huawei‑Ascend310

If the NPU is not detected in certain virtualized environments, the article suggests two fixes: mounting dmidecode or systemd‑detect‑virt into the plugin container, or adding apt‑get install -y systemd to the plugin Dockerfile.

Demo: ResNet‑50 Inference on Ascend 310P

Download the sample code from the Ascend GitHub repository, fetch the Caffe model files, and convert them to an Ascend‑compatible .om model using the ATC tool:

$ atc --model=caffe_model/resnet50.prototxt \
    --weight=caffe_model/resnet50.caffemodel \
    --framework=0 \
    --output=model/resnet50 \
    --soc_version=Ascend310P3 \
    --input_format=NCHW \
    --input_fp16_nodes=data \
    --output_type=FP32 \
    --out_nodes=prob:0

The conversion produces resnet50.om. Deploy a Kubernetes Deployment that requests one Ascend310P device:

kind: Deployment
apiVersion: apps/v1
metadata:
  name: ascend-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ascend-test
  template:
    metadata:
      labels:
        app: ascend-test
    spec:
      containers:
      - name: container-1
        image: ascendhub.huawei.com/public-ascendhub/ascend-pytorch:23.0.RC1-centos7.6
        command: ["top", "-b"]
        resources:
          limits:
            cpu: "2"
            huawei.com/Ascend310P: "1"
            memory: 16Gi
          requests:
            cpu: "2"
            huawei.com/Ascend310P: "1"
            memory: 16Gi

Inside the pod, run the inference script: $ python3 ./src/acl_net.py The output shows a 76 % confidence for class 161 (basset).

CCE (Cloud Container Engine) Usage

After creating a CCE cluster, install the CCE AI suite from the plugin marketplace, add NPU nodes, and configure NPU quotas in the workload definition. The console displays monitoring metrics such as compute utilization, memory usage, and memory occupancy for each node.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KubernetesAI inferenceNPUResNet50CANNHuawei AscendDocker Runtime
Infra Learning Club
Written by

Infra Learning Club

Infra Learning Club shares study notes, cutting-edge technology, and career discussions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.