How to Deploy GPUStack with Docker for Scalable AI Model Serving
This guide walks you through installing NVIDIA drivers, Docker, and the NVIDIA Container Toolkit, then shows step‑by‑step how to run GPUStack in Docker, expand a GPU cluster, and serve large language, multimodal, diffusion, and embedding models with OpenAI‑compatible APIs.
Docker Tutorial for Running GPUStack
GPUStack is an open‑source GPU cluster manager designed for AI workloads. It supports a wide range of hardware, multiple model families (LLMs, VLMs, diffusion, audio, embedding, reranking), and offers distributed inference with back‑ends such as llama-box, vox-box, and vLLM. The lightweight Python package provides an OpenAI‑compatible API, real‑time GPU monitoring, token usage tracking, and simple user/API‑key management.
Key Features
Broad hardware compatibility : manages GPUs on Apple Metal (M‑series), NVIDIA CUDA, AMD ROCm, Huawei Ascend (CANN), MooreThreads MUSA, and HaiGuang DTK.
Extensive model support : LLMs (Qwen, LLaMA, Mistral, DeepSeek, Phi, Yi), multimodal VLMs (Llama‑3.2‑Vision, Pixtral, Qwen‑2‑VL, LLaVA, InternVL2.5), diffusion models (Stable Diffusion, FLUX), speech models (Whisper, CosyVoice), embedding and reranking models (BGE, BCE, Jina).
Heterogeneous GPU & scaling : add mixed‑GPU nodes on‑the‑fly, scale compute power as needed.
Distributed inference : single‑node multi‑GPU and multi‑node multi‑GPU parallel inference.
Multiple inference back‑ends : llama-box (based on llama.cpp), vox-box, vLLM.
Lightweight Python package : minimal dependencies and overhead.
OpenAI‑compatible API : standard REST endpoints for model serving.
User & API‑key management : simplified credential handling.
GPU metrics monitoring : real‑time performance and utilization.
Token usage & rate‑limit statistics : accurate tracking and enforcement.
Supported Hardware Platforms
Apple Metal (M‑series chips)
NVIDIA CUDA (compute capability 6.0+)
AMD ROCm
Huawei Ascend (CANN)
MooreThreads MUSA
HaiGuang DTK
Supported Model Types
Large Language Models (LLMs): Qwen, LLaMA, Mistral, DeepSeek, Phi, Yi, etc.
Multimodal Vision‑Language Models (VLMs): Llama‑3.2‑Vision, Pixtral, Qwen‑2‑VL, LLaVA, InternVL2.5.
Diffusion Models: Stable Diffusion, FLUX.
Audio Models: Whisper (speech‑to‑text), CosyVoice (text‑to‑speech).
Embedding Models: BGE, BCE, Jina.
Reranking Models: BGE, BCE, Jina.
Usage Scenarios
GPUStack is ideal for environments that need efficient GPU resource management and scheduling for AI inference, supporting both single‑node multi‑GPU and multi‑node clusters with various back‑ends.
Step‑by‑Step Tutorial
1. Environment Preparation
Hardware & System Requirements
Ensure an NVIDIA GPU is installed; driver compatible with CUDA 11.0+.
Recommended OS: Ubuntu 22.04 LTS or CentOS 7+.
Verify GPU & Dependencies
# Check NVIDIA GPU detection
lspci | grep -i nvidia
# Verify GCC version
gcc --version2. Install NVIDIA Driver & Docker
Install NVIDIA Driver
# Install kernel headers
sudo apt-get install linux-headers-$(uname -r)
# Add CUDA repository and install driver
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get install nvidia-driver-535 -y
sudo reboot
# Verify driver
nvidia-smiInstall Docker Engine
# Remove old Docker versions
sudo apt-get remove docker.io docker-doc containerd
# Add Docker official repo
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io -y
# Verify Docker
docker infoConfigure NVIDIA Container Toolkit
# Add repository and install toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install nvidia-container-toolkit -y
# Configure Docker runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# Test CUDA container
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi3. Deploy GPUStack Container
docker run -d \
--gpus all \
-p 890:80 \
--ipc=host \
--name gpustack \
-v gpustack-data:/var/lib/gpustack \
gpustack/gpustack:latestParameter notes : --gpus all: expose all GPU devices. --ipc=host: share host IPC namespace for better performance. -v gpustack-data: persist configuration and model data.
4. Retrieve Initial Admin Password
docker exec -it gpustack cat /var/lib/gpustack/initial_admin_passwordAccess the UI at http://<server‑IP> using admin and the retrieved password (change it on first login).
5. Expand GPU Cluster
Obtain a token from the master node:
docker exec -it gpustack cat /var/lib/gpustack/tokenRun a worker node:
docker run -d \
--gpus all \
--network=host \
--ipc=host \
gpustack/gpustack \
--server-url http://<master‑IP> \
--token <token‑from‑master>6. Functional Usage Examples
Deploy a Large Model : In the GPUStack console, go to the Models page and import a model from Hugging Face or a local path (e.g., Llama‑3.2). The system automatically allocates GPU resources and creates an API endpoint.
Playground Testing : Use the built‑in Playground to test multimodal models (Stable Diffusion), text embeddings (BERT), and compare multiple models with parameter tuning.
7. Common Issues
GPU not recognized : run nvidia-smi and verify Docker runtime configuration.
Container fails to start : ensure --ipc=host is set and the persistent volume is mounted.
Network problems : open firewall port 80 and internal RPC port 6789 for cross‑node communication.
8. References
GPUStack official Docker deployment documentation.
NVIDIA Container Toolkit configuration guide.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
