How to Distill and Fine‑Tune DeepSeek R1 with Qwen on Alibaba Cloud PAI
This guide walks you through the complete workflow of preparing instruction data, deploying the DeepSeek‑R1 teacher model, using Alibaba Cloud PAI to generate teacher responses, distilling a smaller Qwen2.5‑7B‑Instruct student model, fine‑tuning it, and deploying the final service, with performance comparisons on several math‑reasoning benchmarks.
Overview
DeepSeek series models have attracted global attention for their excellent performance, often matching or surpassing top closed‑source models. Since February 2025, Alibaba Cloud AI platform PAI has released best‑practice guides covering deployment, application building, distillation, and fine‑tuning of DeepSeek‑R1, DeepSeek‑V3 and related models.
Development Process Overview
Distilling a large language model (LLM) transfers the reasoning ability of a big model to a smaller one, and fine‑tuning further adapts the small model to specific tasks, enabling efficient use of compute resources.
Prepare instruction dataset.
Deploy teacher LLM (DeepSeek‑R1) and generate responses.
Distill and train student model (Qwen2.5‑7B‑Instruct) and deploy the service.
Required Alibaba Cloud Products
PAI platform and Object Storage Service (OSS) are needed. Ensure a PAI workspace is created and an OSS bucket is available.
Step 1 – Prepare Instruction Dataset
Data should contain at least several hundred entries, cover deep‑reasoning tasks such as scientific research, math problems, logical puzzles, and complex decision‑making. Sources must be reliable (publications, encyclopedias, Q&A platforms, research papers). Pre‑process to remove duplicates and errors. The dataset must be a JSON file with an instruction field, e.g.:
[
{
"instruction": "Return your final response within \\boxed{}. The equation $2^{2x}-8\\cdot 2^x+12=0$ is satisfied by: ..."
},
{
"instruction": "Return your final response within \\boxed{}. In $\\triangle ABC$ ..."
}
]Optionally download a seed dataset:
wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/Distil_R1/Bespoke-Stratos-17k_thought.jsonStep 2 – Deploy Teacher Model (DeepSeek‑R1)
Log in to the PAI console, select a region, and open a workspace. Navigate to Quick Start > Model Gallery , choose the DeepSeek‑R1 model card and submit a deployment task. The model can be accelerated with SGLang, vLLM, or BladeLLM.
Call the Model Service
After deployment, use the provided API endpoint and token. Example using the OpenAI‑compatible SDK:
from openai import OpenAI
openai_api_key = "<EAS API KEY>"
openai_api_base = "<EAS API Endpoint>/v1"
client = OpenAI(api_key=openai_api_key, base_url=openai_api_base)
model = client.models.list().data[0].id
def main():
stream = True
chat_completion = client.chat.completions.create(
messages=[{"role":"user","content":[{"type":"text","text":"你好,介绍一下你自己,越详细越好。"}]}],
model=model,
max_completion_tokens=1024,
stream=stream,
)
if stream:
for chunk in chat_completion:
print(chunk.choices[0].delta.content, end="")
else:
print(chat_completion.choices[0].message.content)
if __name__ == "__main__":
main()Batch Generate Teacher Annotations
Obtain the service URL and token from the Model Gallery page, then batch‑call the API to label your own JSON dataset. Example:
import json, tqdm
from openai import OpenAI
# API configuration (same as above)
client = OpenAI(api_key=openai_api_key, base_url=openai_api_base)
model = client.models.list().data[0].id
def generate_response(ins):
chat = client.chat.completions.create(
messages=[{"role":"user","content":[{"type":"text","text":ins}]}],
model=model,
max_completion_tokens=4096,
stream=True,
)
res = ""
for chunk in chat:
res += chunk.choices[0].delta.content
return res
with open("input.json") as fp:
data = json.load(fp)
new_data = []
for d in tqdm.tqdm(data):
prompt = d["instruction"]
output = generate_response(prompt)
new_data.append({"instruction": prompt, "output": output})
with open("output.json", "w") as f:
json.dump(new_data, f, ensure_ascii=False, indent=4)Step 3 – Distill and Fine‑Tune Student Model
Select the Qwen2.5‑7B‑Instruct model in Model Gallery, click “Fine‑Tune Training”, and use full‑parameter fine‑tuning for best results. Configure key hyper‑parameters (default values are recommended) and start training.
After training completes, deploy the fine‑tuned model to PAI‑EAS with one click.
Results Comparison
Evaluation on several math‑reasoning benchmarks shows notable improvements after distillation and fine‑tuning. For example, the original Qwen2.5‑7B‑Instruct scores 10.0 on AIME2024, 74.6 on MATH500, 89.54 on GSM8K, and 33.84 on GPQA Diamond, while the fine‑tuned model reaches 20.0, 80.0, 92.95, and 37.37 respectively.
Related Documentation
Model Online Service (EAS) – https://help.aliyun.com/zh/pai/user-guide/overview-2
Scenario Practices – https://help.aliyun.com/zh/pai/user-guide/best-practices/
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
