17 min read

How to Distill and Fine‑Tune DeepSeek R1 with Qwen on Alibaba Cloud PAI

This guide walks you through the complete workflow of preparing instruction data, deploying the DeepSeek‑R1 teacher model, using Alibaba Cloud PAI to generate teacher responses, distilling a smaller Qwen2.5‑7B‑Instruct student model, fine‑tuning it, and deploying the final service, with performance comparisons on several math‑reasoning benchmarks.

Alibaba Cloud Big Data AI Platform

Feb 24, 2025

How to Distill and Fine‑Tune DeepSeek R1 with Qwen on Alibaba Cloud PAI

Overview

DeepSeek series models have attracted global attention for their excellent performance, often matching or surpassing top closed‑source models. Since February 2025, Alibaba Cloud AI platform PAI has released best‑practice guides covering deployment, application building, distillation, and fine‑tuning of DeepSeek‑R1, DeepSeek‑V3 and related models.

Development Process Overview

Distilling a large language model (LLM) transfers the reasoning ability of a big model to a smaller one, and fine‑tuning further adapts the small model to specific tasks, enabling efficient use of compute resources.

Prepare instruction dataset.

Deploy teacher LLM (DeepSeek‑R1) and generate responses.

Distill and train student model (Qwen2.5‑7B‑Instruct) and deploy the service.

Required Alibaba Cloud Products

PAI platform and Object Storage Service (OSS) are needed. Ensure a PAI workspace is created and an OSS bucket is available.

Step 1 – Prepare Instruction Dataset

Data should contain at least several hundred entries, cover deep‑reasoning tasks such as scientific research, math problems, logical puzzles, and complex decision‑making. Sources must be reliable (publications, encyclopedias, Q&A platforms, research papers). Pre‑process to remove duplicates and errors. The dataset must be a JSON file with an instruction field, e.g.:

[
    {
        "instruction": "Return your final response within \\boxed{}. The equation $2^{2x}-8\\cdot 2^x+12=0$ is satisfied by: ..."
    },
    {
        "instruction": "Return your final response within \\boxed{}. In $\\triangle ABC$ ..."
    }
]

Optionally download a seed dataset:

wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/Distil_R1/Bespoke-Stratos-17k_thought.json

Step 2 – Deploy Teacher Model (DeepSeek‑R1)

Log in to the PAI console, select a region, and open a workspace. Navigate to Quick Start > Model Gallery , choose the DeepSeek‑R1 model card and submit a deployment task. The model can be accelerated with SGLang, vLLM, or BladeLLM.

Call the Model Service

After deployment, use the provided API endpoint and token. Example using the OpenAI‑compatible SDK:

from openai import OpenAI

openai_api_key = "<EAS API KEY>"
openai_api_base = "<EAS API Endpoint>/v1"

client = OpenAI(api_key=openai_api_key, base_url=openai_api_base)
model = client.models.list().data[0].id

def main():
    stream = True
    chat_completion = client.chat.completions.create(
        messages=[{"role":"user","content":[{"type":"text","text":"你好，介绍一下你自己，越详细越好。"}]}],
        model=model,
        max_completion_tokens=1024,
        stream=stream,
    )
    if stream:
        for chunk in chat_completion:
            print(chunk.choices[0].delta.content, end="")
    else:
        print(chat_completion.choices[0].message.content)

if __name__ == "__main__":
    main()

Batch Generate Teacher Annotations

Obtain the service URL and token from the Model Gallery page, then batch‑call the API to label your own JSON dataset. Example:

import json, tqdm
from openai import OpenAI

# API configuration (same as above)
client = OpenAI(api_key=openai_api_key, base_url=openai_api_base)
model = client.models.list().data[0].id

def generate_response(ins):
    chat = client.chat.completions.create(
        messages=[{"role":"user","content":[{"type":"text","text":ins}]}],
        model=model,
        max_completion_tokens=4096,
        stream=True,
    )
    res = ""
    for chunk in chat:
        res += chunk.choices[0].delta.content
    return res

with open("input.json") as fp:
    data = json.load(fp)

new_data = []
for d in tqdm.tqdm(data):
    prompt = d["instruction"]
    output = generate_response(prompt)
    new_data.append({"instruction": prompt, "output": output})

with open("output.json", "w") as f:
    json.dump(new_data, f, ensure_ascii=False, indent=4)

Step 3 – Distill and Fine‑Tune Student Model

Select the Qwen2.5‑7B‑Instruct model in Model Gallery, click “Fine‑Tune Training”, and use full‑parameter fine‑tuning for best results. Configure key hyper‑parameters (default values are recommended) and start training.

After training completes, deploy the fine‑tuned model to PAI‑EAS with one click.

Results Comparison

Evaluation on several math‑reasoning benchmarks shows notable improvements after distillation and fine‑tuning. For example, the original Qwen2.5‑7B‑Instruct scores 10.0 on AIME2024, 74.6 on MATH500, 89.54 on GSM8K, and 33.84 on GPQA Diamond, while the fine‑tuned model reaches 20.0, 80.0, 92.95, and 37.37 respectively.