How to Quickly Deploy DeepSeek‑R1‑Distill on Volcengine Cloud: Three Practical Methods
This article explains how to deploy DeepSeek's open‑source large language models—especially DeepSeek‑R1‑Distill—on Volcengine Cloud using three approaches: a containerized VKE solution, a serverless veFaaS setup, and a one‑click Terraform script, complete with step‑by‑step instructions, code snippets, and configuration tips.
Overview
DeepSeek, a Chinese AI startup, provides several open‑source large language models, including DeepSeek‑V3, DeepSeek‑R1, and the lightweight DeepSeek‑R1‑Distill. This guide presents three fast deployment solutions for DeepSeek‑R1‑Distill on Volcengine Cloud.
DeepSeek Models
DeepSeek‑V3 : 671B‑parameter MoE model trained on 14.8T tokens, suitable for information retrieval, data analysis, and NLP tasks.
DeepSeek‑R1 : Inference model derived from DeepSeek‑V3‑Base, excels at mathematics, code, and reasoning.
DeepSeek‑R1‑Distill : Distilled version of DeepSeek‑R1, optimized for resource‑constrained environments.
Solution 1: Containerized Deployment (VKE)
Use GPU cloud servers, Volcengine Container Service VKE, and Continuous Delivery CP to achieve horizontal scaling.
Step 1 – Create VKE Cluster
Visit the VKE console ( https://console.volcengine.com/vke ) and create a managed cluster with VPC‑CNI networking.
Step 2 – Create Deployment Resource
In the CP console ( https://console.volcengine.com/cp ), select "Resource Management – Deploy Resource", create a new deployment, choose the VKE cluster created earlier, and set the component configuration to include csi-tos and nvidia-device-plugin.
Step 3 – Create AI Application
Create a custom AI application in CP, select the SGLang image, and mount the DeepSeek‑R1‑Distill model at /model.
python3 -m sglang.launch_server --model-path /model --context-length 2048 --trust-remote-code --host 0.0.0.0 --port 8080Step 4 – Configure Load Balancer
Expose the service via Volcengine CLB (public or private) and obtain the public IP address.
Solution 2: Serverless Deployment (veFaaS)
Leverage veFaaS for a fully managed, auto‑scaling inference service.
Step 1 – Create Ollama Inference Service
In the veFaaS console, create a Web App using the public Ollama image with GPU acceleration.
Step 2 – Create API Gateway Endpoint
Use Volcengine API Gateway (APIG) to expose the Ollama service.
Step 3 – Test Locally with open‑webui
Run open‑webui in Docker, pointing it to the Ollama endpoint.
docker run -p 3000:8080 \
-v open-webui:/app/backend/data \
--name open-webui-new \
--restart always \
--env OLLAMA_BASE_URL=[API gateway address] \
--env ENABLE_OPENAI_API=false \
--env ENABLE_WEBSOCKET_SUPPORT=false \
--env ENABLE_RAG_WEB_SEARCH=true \
--env RAG_WEB_SEARCH_ENGINE=duckduckgo \
--env WEBUI_AUTH=false \
--env RAG_RERANKING_MODEL_AUTO_UPDATE=false \
--env WHISPER_MODEL_AUTO_UPDATE=false \
--env RAG_EMBEDDING_MODEL_AUTO_UPDATE=false \
ghcr.io/open-webui/open-webui:mainSolution 3: One‑Click Terraform Deployment
Use Terraform to provision all required resources (VPC, subnet, security groups, ECS GPU instance, CLB, etc.) and run a startup script that installs Docker, NVIDIA drivers, and launches the model container.
terraform {<br/> required_providers {<br/> volcengine = {<br/> source = "volcengine/volcengine"<br/> version = "0.0.159"<br/> }<br/> }<br/>}<br/><br/>resource "volcengine_vpc" "foo" {<br/> vpc_name = "acc-test-vpc"<br/> cidr_block = "172.16.0.0/16"<br/>}<br/>... (remaining Terraform resources) ...<br/>Step 3 – Verify
After the ECS instance is ready, run the following curl command to test the model:
curl -X POST http://<your‑lb‑ip>/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"/models","messages":[{"role":"user","content":"Your question"}],"stream":false,"temperature":0.7}'Conclusion
The guide demonstrates end‑to‑end deployment of DeepSeek‑R1‑Distill on Volcengine using containerized, serverless, and IaC (Terraform) approaches, enabling developers to quickly launch inference services and integrate them into their applications.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Volcano Engine Developer Services
The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
