How to Quickly Deploy DeepSeek‑R1‑Distill on Volcengine Cloud: Three Practical Methods

This article explains how to deploy DeepSeek's open‑source large language models—especially DeepSeek‑R1‑Distill—on Volcengine Cloud using three approaches: a containerized VKE solution, a serverless veFaaS setup, and a one‑click Terraform script, complete with step‑by‑step instructions, code snippets, and configuration tips.

Volcano Engine Developer Services
Volcano Engine Developer Services
Volcano Engine Developer Services
How to Quickly Deploy DeepSeek‑R1‑Distill on Volcengine Cloud: Three Practical Methods

Overview

DeepSeek, a Chinese AI startup, provides several open‑source large language models, including DeepSeek‑V3, DeepSeek‑R1, and the lightweight DeepSeek‑R1‑Distill. This guide presents three fast deployment solutions for DeepSeek‑R1‑Distill on Volcengine Cloud.

DeepSeek Models

DeepSeek‑V3 : 671B‑parameter MoE model trained on 14.8T tokens, suitable for information retrieval, data analysis, and NLP tasks.

DeepSeek‑R1 : Inference model derived from DeepSeek‑V3‑Base, excels at mathematics, code, and reasoning.

DeepSeek‑R1‑Distill : Distilled version of DeepSeek‑R1, optimized for resource‑constrained environments.

Solution 1: Containerized Deployment (VKE)

Use GPU cloud servers, Volcengine Container Service VKE, and Continuous Delivery CP to achieve horizontal scaling.

Step 1 – Create VKE Cluster

Visit the VKE console ( https://console.volcengine.com/vke ) and create a managed cluster with VPC‑CNI networking.

VKE cluster creation UI
VKE cluster creation UI

Step 2 – Create Deployment Resource

In the CP console ( https://console.volcengine.com/cp ), select "Resource Management – Deploy Resource", create a new deployment, choose the VKE cluster created earlier, and set the component configuration to include csi-tos and nvidia-device-plugin.

Component selection UI
Component selection UI

Step 3 – Create AI Application

Create a custom AI application in CP, select the SGLang image, and mount the DeepSeek‑R1‑Distill model at /model.

python3 -m sglang.launch_server --model-path /model --context-length 2048 --trust-remote-code --host 0.0.0.0 --port 8080

Step 4 – Configure Load Balancer

Expose the service via Volcengine CLB (public or private) and obtain the public IP address.

Load balancer configuration UI
Load balancer configuration UI

Solution 2: Serverless Deployment (veFaaS)

Leverage veFaaS for a fully managed, auto‑scaling inference service.

Step 1 – Create Ollama Inference Service

In the veFaaS console, create a Web App using the public Ollama image with GPU acceleration.

veFaaS Ollama service UI
veFaaS Ollama service UI

Step 2 – Create API Gateway Endpoint

Use Volcengine API Gateway (APIG) to expose the Ollama service.

APIG creation UI
APIG creation UI

Step 3 – Test Locally with open‑webui

Run open‑webui in Docker, pointing it to the Ollama endpoint.

docker run -p 3000:8080 \
  -v open-webui:/app/backend/data \
  --name open-webui-new \
  --restart always \
  --env OLLAMA_BASE_URL=[API gateway address] \
  --env ENABLE_OPENAI_API=false \
  --env ENABLE_WEBSOCKET_SUPPORT=false \
  --env ENABLE_RAG_WEB_SEARCH=true \
  --env RAG_WEB_SEARCH_ENGINE=duckduckgo \
  --env WEBUI_AUTH=false \
  --env RAG_RERANKING_MODEL_AUTO_UPDATE=false \
  --env WHISPER_MODEL_AUTO_UPDATE=false \
  --env RAG_EMBEDDING_MODEL_AUTO_UPDATE=false \
  ghcr.io/open-webui/open-webui:main

Solution 3: One‑Click Terraform Deployment

Use Terraform to provision all required resources (VPC, subnet, security groups, ECS GPU instance, CLB, etc.) and run a startup script that installs Docker, NVIDIA drivers, and launches the model container.

terraform {<br/>  required_providers {<br/>    volcengine = {<br/>      source  = "volcengine/volcengine"<br/>      version = "0.0.159"<br/>    }<br/>  }<br/>}<br/><br/>resource "volcengine_vpc" "foo" {<br/>  vpc_name   = "acc-test-vpc"<br/>  cidr_block = "172.16.0.0/16"<br/>}<br/>... (remaining Terraform resources) ...<br/>

Step 3 – Verify

After the ECS instance is ready, run the following curl command to test the model:

curl -X POST http://<your‑lb‑ip>/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"/models","messages":[{"role":"user","content":"Your question"}],"stream":false,"temperature":0.7}'

Conclusion

The guide demonstrates end‑to‑end deployment of DeepSeek‑R1‑Distill on Volcengine using containerized, serverless, and IaC (Terraform) approaches, enabling developers to quickly launch inference services and integrate them into their applications.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DeepSeekcloud deploymentTerraformVolcengine
Volcano Engine Developer Services
Written by

Volcano Engine Developer Services

The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.