One-Click GPU-Enabled Kind Cluster Setup for Running Large AI Models
This tutorial walks you through using a one‑click script to create a GPU‑enabled Kind Kubernetes cluster, evenly distribute GPU resources across nodes with nvkind, install necessary drivers and toolkits, deploy a vLLM‑served large language model, and verify its operation, all on a local or cloud environment.
Prerequisites
The full script is available at GitHub and assumes an Ubuntu host.
A machine with an NVIDIA GPU
Administrator (sudo) privileges
A stable internet connection
If you lack a local GPU server, you can rent a GPU instance from cloud providers such as Alibaba Cloud, AWS, or Azure; Alibaba Cloud often offers the most cost‑effective option.
Run the Setup Script
Execute the following command; the script will install all dependencies and create a GPU‑enabled Kind cluster.
bash install.shScript Details
Install Command‑Line Tools
Docker, kubectl, Helm, Kind, and nvkind are installed.
sudo apt update
sudo apt install -y docker.io
sudo snap install kubectl --classic
# Add kubectl completion to bashrc
echo 'source <(kubectl completion bash)' >> ~/.bashrc
source ~/.bashrc
sudo snap install helm --classic
# Install kind
[ $(uname -m) = x86_64 ] && curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.25.0/kind-linux-amd64
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind
# Install nvkind
curl -L -o ~/nvkind-linux-amd64.tar.gz https://github.com/Jeffwan/kind-with-gpus-examples/releases/download/v0.1.0/nvkind-linux-amd64.tar.gz
tar -xzvf ~/nvkind-linux-amd64.tar.gz
mv nvkind-linux-amd64 /usr/local/bin/nvkindInstall NVIDIA GPU Driver
The script installs NVIDIA driver version 565.57.01.
wget https://cn.download.nvidia.com/tesla/565.57.01/NVIDIA-Linux-x86_64-565.57.01.run
sh NVIDIA-Linux-x86_64-565.57.01.run --silentInstall and Configure NVIDIA Container Toolkit
The toolkit mounts NVIDIA devices into containers.
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker --set-as-default --cdi.enabled
sudo nvidia-ctk config --set accept-nvidia-visible-devices-as-volume-mounts=true --in-place
sudo systemctl restart dockerVerify GPU Availability
Run the following checks:
Execute nvidia-smi to list GPUs.
Run a Docker container with the NVIDIA runtime to ensure GPU detection.
Confirm containers can access GPU devices.
# Run nvidia-smi to list GPU devices
nvidia-smi -L
if [ $? -ne 0 ]; then
echo "nvidia-smi failed to execute."
exit 1
fi
# Run a Docker container with NVIDIA runtime
docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all ubuntu:20.04 nvidia-smi -L
if [ $? -ne 0 ]; then
echo "Docker command with NVIDIA runtime failed to execute."
exit 1
fi
# Run a Docker container with mounted /dev/null to check GPU accessibility
docker run -v /dev/null:/var/run/nvidia-container-devices/all ubuntu:20.04 nvidia-smi -L
if [ $? -ne 0 ]; then
echo "Docker command with mounted /dev/null failed to execute."
exit 1
fiCreate the Kind GPU Cluster
A configuration file is generated based on the number of GPUs, then nvkind creates the cluster.
cat <<'EOF' > one-worker-per-gpu.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
{{- range $gpu := until numGPUs }}
- role: worker
extraMounts:
# We inject all NVIDIA GPUs using the nvidia‑container‑runtime.
# This requires `accept-nvidia-visible-devices-as-volume-mounts = true` in `/etc/nvidia-container-runtime/config.toml`
- hostPath: /dev/null
containerPath: /var/run/nvidia-container-devices/{{ $gpu }}
{{- end }}
EOF
nvkind cluster create --name gpu-cluster --config-template=one-worker-per-gpu.yamlInstall NVIDIA GPU Operator
The operator automates driver, device plugin, and DCGM exporter installation.
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
helm install --wait -n gpu-operator --create-namespace \
nvidia/gpu-operator \
--set driver.enabled=falseInstall Cloud Provider Kind
This component enables LoadBalancer‑type service exposure.
curl -L ${KIND_CLOUD_PROVIDER_URL} -o cloud-provider-kind.tar.gz
tar -xvzf cloud-provider-kind.tar.gz
chmod +x cloud-provider-kind
sudo mv cloud-provider-kind /usr/local/bin/
echo "Starting cloud-provider-kind in the background..."
LOG_FILE="/tmp/cloud-provider-kind.log"
nohup cloud-provider-kind > $LOG_FILE 2>&1 &
echo $! > /tmp/cloud-provider-kind.pid
echo "Setup complete. All components have been installed successfully."Run a Large Model with vLLM
The DeepSeek‑R1‑Distill‑Qwen‑1.5B model is deployed to verify the cluster.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: deepseek-r1-distill-qwen-1-5b
namespace: default
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
volumeMode: Filesystem
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: deepseek-r1-distill-qwen-1-5b
namespace: default
labels:
app: deepseek-r1-distill-qwen-1-5b
spec:
replicas: 1
selector:
matchLabels:
app: deepseek-r1-distill-qwen-1-5b
template:
metadata:
labels:
app: deepseek-r1-distill-qwen-1-5b
spec:
volumes:
- name: cache-volume
persistentVolumeClaim:
claimName: deepseek-r1-distill-qwen-1-5b
containers:
- name: deepseek-r1-distill-qwen-1-5b
image: vllm/vllm-openai:latest
command: ["/bin/sh", "-c"]
args:
- "vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --trust-remote-code --enable-chunked-prefill --max_num_batched_tokens 1024"
ports:
- containerPort: 8000
resources:
limits:
nvidia.com/gpu: "1"
requests:
nvidia.com/gpu: "1"
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 60
periodSeconds: 5
volumeMounts:
- mountPath: /root/.cache/huggingface
name: cache-volume
---
apiVersion: v1
kind: Service
metadata:
name: deepseek-r1-distill-qwen-1-5b
namespace: default
spec:
ports:
- name: deepseek-r1-distill-qwen-1-5b
port: 80
protocol: TCP
targetPort: 8000
selector:
app: deepseek-r1-distill-qwen-1-5b
type: LoadBalancerThe vLLM image (~8 GB) takes time to download; the first pod start downloads model weights (≈508 s). Subsequent restarts load the cached weights in ~0.55 s.
# Check pod status
kubectl get pod
# Check service
kubectl get svc
# Access the model via LoadBalancer IP
curl --location 'http://172.18.0.4/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model":"deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
"messages":[{"role":"user","content":"你是谁?"}]
}'The response shows the model successfully processed the request.
Cleanup
When testing is complete, remove the cluster with:
bash cleanup.shConclusion
This guide demonstrates how a one‑click script can rapidly provision a GPU‑enabled Kind cluster for large‑model development and testing, using nvkind for balanced GPU allocation and vLLM to serve the DeepSeek‑R1‑Distill‑Qwen‑1.5B model.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
