Cloud Native 10 min read

How to Deploy DeepSeek‑R1‑Distill Models on Alibaba Cloud CAP (Ollama & Transformer)

This guide walks you through deploying various DeepSeek‑R1‑Distill models on Alibaba Cloud's Serverless AI platform CAP, covering supported models, deployment options (Ollama and Transformer), step‑by‑step template and model‑service setups, validation methods, and tips for adding custom models.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
How to Deploy DeepSeek‑R1‑Distill Models on Alibaba Cloud CAP (Ollama & Transformer)

Background

DeepSeek‑R1‑Distill models are compact, low‑cost variants of the DeepSeek‑R1 family that retain strong benchmark performance. Alibaba Cloud’s Cloud Application Platform (CAP) provides a Serverless + AI environment where these models can be deployed as either application templates or dedicated model services.

Supported Models

DeepSeek‑R1‑Distill‑Qwen‑1.5B – Transformer – Tesla 16GB – 131072 tokens

DeepSeek‑R1‑Distill‑Qwen‑7B – Transformer – Tesla 16GB – 131072 tokens

DeepSeek‑R1‑Distill‑Llama‑8B – Transformer – Tesla 16GB – 131072 tokens

DeepSeek‑R1‑Distill‑Qwen‑14B – Transformer – Ada 48GB – 131072 tokens

DeepSeek‑R1‑Distill‑Qwen‑32B – Transformer – Ada 48GB – 131072 tokens

DeepSeek‑R1‑Distill‑Qwen‑1.5B‑GGUF – Ollama – Tesla 8GB – 131072 tokens

DeepSeek‑R1‑Distill‑Qwen‑7B‑GGUF – Ollama – Tesla 16GB – 131072 tokens

DeepSeek‑R1‑Distill‑Llama‑8B‑GGUF – Ollama – Tesla 16GB – 131072 tokens

DeepSeek‑R1‑Distill‑Qwen‑14B‑GGUF – Ollama – Ada 48GB – 131072 tokens

DeepSeek‑R1‑Distill‑Qwen‑32B‑GGUF – Ollama – Ada 48GB – 131072 tokens

Deployment Options

The platform supports two main deployment approaches:

Ollama : a lightweight inference framework focused on quantized models and open‑source LLMs.

Transformer : Hugging Face’s inference framework compatible with PyTorch, TensorFlow, and other model formats.

Method 1 – Application Template Deployment

Create a new project at https://cap.console.aliyun.com/projects.

Search for “DeepSeek” and select the template “Build an AI chat assistant based on DeepSeek‑R1”.

Choose a region and confirm deployment; the process typically takes about ten minutes.

After deployment, open the OpenWebUI service, enable public access, and interact with the model via the web UI.

DeepSeek template selection
DeepSeek template selection

Method 2 – Model Service Deployment

In the CAP console, create a blank project.

Add a “Model Service” component.

Select the desired model, e.g., DeepSeek‑R1‑Distill‑Qwen‑7B‑GGUF.

Configure resources (Tesla series GPUs are recommended; Ada GPUs for 14B+ models).

Preview and deploy; the platform will download the model (≈10 minutes).

Validate the service via the built‑in debugger, a local IDE, or third‑party platforms such as Chatbox.

Resource configuration
Resource configuration

Adding Custom Models

If a desired model is not listed, use the “More model sources” option to import from ModelScope. Provide the ModelScope ID (e.g., lmstudio-community/DeepSeek-R1-Distill-Qwen-7B-GGUF) and the corresponding GGUF file name. For 14B and larger models, allocate an Ada GPU with at least 48 GB memory.

Validation

Test the deployed model through:

OpenWebUI chat interface.

CAP’s debugger and local IDE calls.

Third‑party platforms (e.g., Chatbox) by invoking the model’s API endpoint.

About CAP

CAP (Cloud Application Platform) is Alibaba Cloud’s one‑stop solution for building, deploying, and managing serverless applications. It integrates AI large‑language models with a Serverless architecture, offering low‑cost GPU model hosting, rapid template‑based app creation, and flexible component assembly for custom development.

CAP overview
CAP overview
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ServerlessAITransformerModel DeploymentDeepSeekOllamaCAP
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.