How to Deploy DeepSeek‑R1‑Distill Models on Alibaba Cloud CAP (Ollama & Transformer)
This guide walks you through deploying various DeepSeek‑R1‑Distill models on Alibaba Cloud's Serverless AI platform CAP, covering supported models, deployment options (Ollama and Transformer), step‑by‑step template and model‑service setups, validation methods, and tips for adding custom models.
Background
DeepSeek‑R1‑Distill models are compact, low‑cost variants of the DeepSeek‑R1 family that retain strong benchmark performance. Alibaba Cloud’s Cloud Application Platform (CAP) provides a Serverless + AI environment where these models can be deployed as either application templates or dedicated model services.
Supported Models
DeepSeek‑R1‑Distill‑Qwen‑1.5B – Transformer – Tesla 16GB – 131072 tokens
DeepSeek‑R1‑Distill‑Qwen‑7B – Transformer – Tesla 16GB – 131072 tokens
DeepSeek‑R1‑Distill‑Llama‑8B – Transformer – Tesla 16GB – 131072 tokens
DeepSeek‑R1‑Distill‑Qwen‑14B – Transformer – Ada 48GB – 131072 tokens
DeepSeek‑R1‑Distill‑Qwen‑32B – Transformer – Ada 48GB – 131072 tokens
DeepSeek‑R1‑Distill‑Qwen‑1.5B‑GGUF – Ollama – Tesla 8GB – 131072 tokens
DeepSeek‑R1‑Distill‑Qwen‑7B‑GGUF – Ollama – Tesla 16GB – 131072 tokens
DeepSeek‑R1‑Distill‑Llama‑8B‑GGUF – Ollama – Tesla 16GB – 131072 tokens
DeepSeek‑R1‑Distill‑Qwen‑14B‑GGUF – Ollama – Ada 48GB – 131072 tokens
DeepSeek‑R1‑Distill‑Qwen‑32B‑GGUF – Ollama – Ada 48GB – 131072 tokens
Deployment Options
The platform supports two main deployment approaches:
Ollama : a lightweight inference framework focused on quantized models and open‑source LLMs.
Transformer : Hugging Face’s inference framework compatible with PyTorch, TensorFlow, and other model formats.
Method 1 – Application Template Deployment
Create a new project at https://cap.console.aliyun.com/projects.
Search for “DeepSeek” and select the template “Build an AI chat assistant based on DeepSeek‑R1”.
Choose a region and confirm deployment; the process typically takes about ten minutes.
After deployment, open the OpenWebUI service, enable public access, and interact with the model via the web UI.
Method 2 – Model Service Deployment
In the CAP console, create a blank project.
Add a “Model Service” component.
Select the desired model, e.g., DeepSeek‑R1‑Distill‑Qwen‑7B‑GGUF.
Configure resources (Tesla series GPUs are recommended; Ada GPUs for 14B+ models).
Preview and deploy; the platform will download the model (≈10 minutes).
Validate the service via the built‑in debugger, a local IDE, or third‑party platforms such as Chatbox.
Adding Custom Models
If a desired model is not listed, use the “More model sources” option to import from ModelScope. Provide the ModelScope ID (e.g., lmstudio-community/DeepSeek-R1-Distill-Qwen-7B-GGUF) and the corresponding GGUF file name. For 14B and larger models, allocate an Ada GPU with at least 48 GB memory.
Validation
Test the deployed model through:
OpenWebUI chat interface.
CAP’s debugger and local IDE calls.
Third‑party platforms (e.g., Chatbox) by invoking the model’s API endpoint.
About CAP
CAP (Cloud Application Platform) is Alibaba Cloud’s one‑stop solution for building, deploying, and managing serverless applications. It integrates AI large‑language models with a Serverless architecture, offering low‑cost GPU model hosting, rapid template‑based app creation, and flexible component assembly for custom development.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
