17 min read

How DistilQwen2.5-DS3-0324 Achieves Fast, Accurate Reasoning via Quick‑Think Distillation

This article introduces DistilQwen2.5-DS3-0324, a distilled language model series that balances rapid inference with strong reasoning by applying a fast‑thinking chain‑of‑thought strategy, details its two‑stage distillation framework, evaluation on diverse benchmarks, and provides code for downloading and using the models.

Alibaba Cloud Big Data AI Platform

Apr 22, 2025

How DistilQwen2.5-DS3-0324 Achieves Fast, Accurate Reasoning via Quick‑Think Distillation

DistilQwen2.5-DS3-0324 Overview

DistilQwen2.5-DS3-0324 is a distilled language‑model series released by Alibaba Cloud PAI, built by transferring the fast‑thinking inference ability of DeepSeek‑V3‑0324 into lightweight models. It combines quick‑thinking strategies with chain‑of‑thought distillation to achieve high inference speed while preserving complex reasoning capabilities.

Distillation Technique

The framework consists of two stages: (1) fast‑thinking CoT data collection and alignment, where CoT trajectories from the teacher model are shortened and aligned with the small model’s cognition; (2) supervised fine‑tuning of the Qwen2.5‑based student models using the aligned data.

To address the mismatch between large‑model and small‑model reasoning paths, a LLM‑as‑a‑Judge paradigm evaluates the difficulty of each CoT (easy, medium, hard) and either expands easy chains or compresses hard ones until they become medium difficulty, ensuring the student can follow them.

Model Evaluation

DistilQwen2.5‑DS3‑0324 was evaluated on multiple benchmarks covering mathematics (AIME2024, MATH‑500), code (LiveCodeBench V2), and scientific reasoning (GPQA‑Diamond, MMLU‑PRO). Across 7B, 14B, and 32B parameter sizes, the series consistently outperformed the original Qwen2.5 and several closed‑source models while using far fewer output tokens.

Usage Example

Below is a minimal Python snippet for loading the 7B model with HuggingFace Transformers (>=4.37.0):

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "alibaba-pai/DistilQwen2.5-DS3-0324-7B"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "xxxxx"
messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You should think step‑by‑step."},
    {"role": "user", "content": prompt},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs, max_new_tokens=2048)
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Model Download

The models are publicly available on Hugging Face and ModelScope. Example download code:

from huggingface_hub import snapshot_download

for name in ["DistilQwen2.5-DS3-0324-7B", "DistilQwen2.5-DS3-0324-14B", "DistilQwen2.5-DS3-0324-32B"]:
    repo = f"alibaba-pai/{name}"
    snapshot_download(repo_id=repo, cache_dir=f"./{name}/")

Conclusion and Future Work

DistilQwen2.5‑DS3‑0324 demonstrates that combining knowledge distillation with a fast‑thinking chain‑of‑thought strategy can produce lightweight models that retain strong reasoning ability and operate efficiently on resource‑constrained devices. Future work will further improve the distillation pipeline and expand the model family.

deep learning large language models chain of thought model distillation fast inference

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.