14 min read

Deploy DeepSeek‑R1 LLM on Alibaba Cloud ACK One with ACS GPU in Minutes

This guide walks you through deploying the DeepSeek‑R1 large‑language‑model inference service on Alibaba Cloud ACK One registered clusters using ACS GPU compute, covering model preparation, OSS storage setup, PersistentVolume configuration, arena‑based service deployment, and verification steps with concrete commands and parameters.

Alibaba Cloud Infrastructure

Feb 13, 2025

Deploy DeepSeek‑R1 LLM on Alibaba Cloud ACK One with ACS GPU in Minutes

Background

DeepSeek‑R1 is a high‑performance large language model (LLM) optimized for mathematical reasoning, coding challenges, and general question answering. Deploying it on‑premises can be limited by compute capacity, so Alibaba Cloud ACK One registered clusters with ACS GPU compute can be used to scale inference workloads.

Prerequisites

Access to Alibaba Cloud Container Service (ACK) console and ACS GPU resources.

git‑lfs and ossutil installed.

Arena client configured.

Step 1 – Prepare the model

Clone the DeepSeek‑R1‑Distill‑Qwen‑7B repository from ModelScope and pull the LFS files.

git lfs install
GIT_LFS_SKIP_SMUDGE=1 git clone https://www.modelscope.cn/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B.git
cd DeepSeek-R1-Distill-Qwen-7B
git lfs pull

Upload the model directory to an OSS bucket.

ossutil mkdir oss://my-bucket/models/DeepSeek-R1-Distill-Qwen-7B
ossutil cp -r ./DeepSeek-R1-Distill-Qwen-7B oss://my-bucket/models/DeepSeek-R1-Distill-Qwen-7B

Step 2 – Create PersistentVolume and PersistentVolumeClaim

Define a static OSS‑backed PersistentVolume (PV) using the ossplugin.csi.alibabacloud.com driver and a matching PersistentVolumeClaim (PVC) named llm-model. Example YAML:

apiVersion: v1
kind: Secret
metadata:
  name: oss-secret
stringData:
  akId: <your-oss-ak>
  akSecret: <your-oss-sk>
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: llm-model
spec:
  capacity:
    storage: 30Gi
  accessModes:
    - ReadOnlyMany
  persistentVolumeReclaimPolicy: Retain
  csi:
    driver: ossplugin.csi.alibabacloud.com
    volumeHandle: llm-model
    nodePublishSecretRef:
      name: oss-secret
      namespace: default
    volumeAttributes:
      bucket: my-bucket
      url: oss-cn-hangzhou-internal.aliyuncs.com
      otherOpts: "-o umask=022 -o max_stat_cache_size=0 -o allow_other"
      path: /models/DeepSeek-R1-Distill-Qwen-7B
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: llm-model
spec:
  accessModes:
    - ReadOnlyMany
  resources:
    requests:
      storage: 30Gi
  selector:
    matchLabels:
      alicloud-pvname: llm-model

Step 3 – Deploy the inference service with Arena

Run a custom serving job that uses ACS GPU resources. The command specifies GPU count, CPU, memory, required labels, Docker image, and mounts the PVC.

arena serve custom \
  --name=deepseek-r1 \
  --version=v1 \
  --gpus=1 \
  --cpu=8 \
  --memory=32Gi \
  --replicas=1 \
  --env-from-secret=akId=oss-secret \
  --env-from-secret=akSecret=oss-secret \
  --label=alibabacloud.com/acs="true" \
  --label=alibabacloud.com/compute-class=gpu \
  --label=alibabacloud.com/gpu-model-series=example-model \
  --restful-port=8000 \
  --readiness-probe-action="tcpSocket" \
  --readiness-probe-action-option="port: 8000" \
  --readiness-probe-option="initialDelaySeconds: 30" \
  --readiness-probe-option="periodSeconds: 30" \
  --image=registry-cn-hangzhou-vpc.ack.aliyuncs.com/ack-demo/vllm:v0.6.6 \
  --data=llm-model:/model/DeepSeek-R1-Distill-Qwen-7B \
  "vllm serve /model/DeepSeek-R1-Distill-Qwen-7B --port 8000 --trust-remote-code --served-model-name deepseek-r1 --max-model-len 32768 --gpu-memory-utilization 0.95 --enforce-eager"

Key labels required for ACS GPU:

--label=alibabacloud.com/acs="true"

--label=alibabacloud.com/compute-class=gpu

--label=alibabacloud.com/gpu-model-series=example-model

Step 4 – Verify the service

Check the job status:

arena serve get deepseek-r1

Confirm the pod is scheduled on a virtual node:

kubectl get po -owide | grep deepseek-r1-v1

Port‑forward the service to the local machine:

kubectl port-forward svc/deepseek-r1-v1 8000:8000

Send a test request:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"deepseek-r1","messages":[{"role":"user","content":"你好，DeepSeek。"}],"max_tokens":100,"temperature":0.7,"top_p":0.9,"seed":10}'

The response is a JSON object containing the model’s answer.

LLM model deployment Kubernetes vLLM DeepSeek ACK One ACS GPU

Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Background

Prerequisites

Step 1 – Prepare the model

Step 2 – Create PersistentVolume and PersistentVolumeClaim

Step 3 – Deploy the inference service with Arena

Step 4 – Verify the service

Alibaba Cloud Infrastructure

How this landed with the community

Was this worth your time?

0 Comments

Step 1 – Prepare the model

Step 2 – Create PersistentVolume and PersistentVolumeClaim

Step 3 – Deploy the inference service with Arena

Step 4 – Verify the service