Deploying Qwen3-8B Large Language Model on Alibaba Cloud ACK with ACS GPU Acceleration
This guide explains how to prepare, deploy, and verify the Qwen3‑8B large language model on an Alibaba Cloud Container Service for Kubernetes (ACK) cluster using ACS GPU resources, covering prerequisites, model download, storage setup, Kubernetes manifests, and testing the inference service.
The document introduces Qwen3, the latest hybrid inference model from the Qwen series, and describes its competitive performance on various benchmarks, supporting 119 languages and multiple reasoning modes.
It then outlines the prerequisites: an ACK cluster with GPU nodes, kubectl access, and the ability to add a GPU node pool.
Step 1: Prepare Qwen3‑8B model files – install git-lfs , clone the model from ModelScope, and upload the files to OSS using ossutil . Example commands: git lfs install GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Qwen/Qwen3-8B cd Qwen3-8B/ git lfs pull ossutil mkdir oss:// /models/Qwen3-8B ossutil cp -r ./Qwen3-8B oss:// /models/Qwen3-8B
Next, create a PersistentVolume (PV) and PersistentVolumeClaim (PVC) named llm-model that point to the OSS bucket, using a secret for access keys. The YAML manifests are provided in the source.
Step 2: Deploy the inference service – apply a Kubernetes Deployment named qwen3 that mounts the PVC and runs the vllm container with the appropriate command line arguments, GPU, CPU, and memory resources. A corresponding Service exposing port 8000 is also defined.
Step 3: Verify the service – forward the service port locally with kubectl port-forward svc/qwen3 8000:8000 and send a test request via curl to the /v1/chat/completions endpoint. Expected forwarding messages and JSON response are shown.
The guide also mentions that ACK Pro clusters can use ACS GPU resources via virtual nodes, requiring the label alibabacloud.com/compute-class: gpu in the pod metadata. A simplified deployment snippet for ACS is included.
Finally, a list of reference links is provided for adding GPU node pools, connecting with kubectl, installing git-lfs , installing ossutil , creating ACK managed clusters, and accessing the ACS console.
Alibaba Cloud Infrastructure
For uninterrupted computing services
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.