How EasyDistill Simplifies LLM Knowledge Distillation for Faster, Smaller Models
EasyDistill, an open‑source toolkit from Alibaba Cloud AI Platform, streamlines knowledge distillation of large language models by offering modular data synthesis, black‑box and white‑box training, reinforcement‑learning and preference‑optimization techniques, enabling the creation of compact, high‑performance DistilQwen models and accompanying datasets.
EasyDistill Overview
Knowledge distillation transfers the knowledge of a large model to a smaller one without significantly degrading performance, reducing computational cost and enabling deployment in resource‑constrained environments. Alibaba Cloud AI Platform (PAI) released the open‑source EasyDistill toolkit to simplify this process for large language models.
Framework Features
EasyDistill provides modular components for data synthesis, basic and advanced distillation, and reinforcement‑learning based preference optimization. It supports both black‑box (API‑only) and white‑box (access to hidden states) training strategies, multiple loss functions (including KL‑divergence), and various RL algorithms such as PPO and Group Relative Policy Optimization.
Data Synthesis
EasyDistill integrates several data synthesis and augmentation operations that use proprietary and open‑source teacher models to expand instruction datasets and generate diverse chain‑of‑thought (CoT) data. Operations include instruction expansion, instruction optimization, and automatic generation of instruction‑response pairs, as well as CoT simplification and expansion operators to improve reasoning efficiency.
Basic Distillation Models
For closed‑source models, EasyDistill offers black‑box distillation based on supervised fine‑tuning (SFT), treating teacher outputs as ground truth. It supports any OpenAI‑compatible API (e.g., OpenAI, DashScope, PAI‑EAS). For open‑source teacher models, a white‑box strategy extracts token‑level logits and minimizes the divergence between teacher and student logits, optionally using only the top‑k logits to reduce computation.
Advanced Distillation Training
To avoid over‑fitting to teacher outputs, EasyDistill incorporates reinforcement learning (RL) and preference optimization. It can train a reward model from teacher‑generated preference data (RLAIF) and apply PPO or Group Relative Policy Optimization. Direct Preference Optimization (DPO) and the proposed Cognitive Preference Optimization (CogPO) further align student models with teacher cognition.
Getting Started
EasyDistill is modular; users select the needed modules and run them via a simple CLI.
git clone https://github.com/modelscope/easydistill
cd EasyDistill python setup.py install easydistill --config <config-file-path>A sample black‑box configuration (JSON) is shown below:
{
"job_type": "kd_black_box_local",
"dataset": {
"instruction_path": "train.json",
"labeled_path": "train_labeled.json",
"template": "chat_template/chat_template_kd.jinja",
"seed": 42
},
"inference": {
"enable_chunked_prefill": true,
"seed": 777,
"gpu_memory_utilization": 0.9,
"temperature": 0.8,
"trust_remote_code": true,
"enforce_eager": false,
"max_model_len": 4096,
"max_new_tokens": 512
},
"models": {
"teacher": "teacher/Qwen/Qwen2.5-7B-Instruct/",
"student": "student/Qwen/Qwen2.5-0.5B-Instruct/"
},
"training": {
"output_dir": "./result/",
"num_train_epochs": 3,
"per_device_train_batch_size": 1,
"gradient_accumulation_steps": 8,
"max_length": 512,
"save_steps": 1000,
"logging_steps": 1,
"learning_rate": 2e-5,
"weight_decay": 0.05,
"warmup_ratio": 0.1,
"lr_scheduler_type": "cosine"
}
}For closed‑source APIs, only the base URL and API key need to be provided.
{
"job_type": "kd_black_box_api",
"dataset": {
"instruction_path": "train.json",
"labeled_path": "train_labeled.json",
"template": "./chat_template/chat_template_kd.jinja",
"seed": 42
},
"inference": {
"base_url": "ENDPOINT",
"api_key": "TOKEN",
"stream": true,
"system_prompt": "You are a helpful assistant.",
"max_new_tokens": 512
},
"models": {
"student": "student/Qwen/Qwen2.5-0.5B-Instruct/"
},
"training": {
"output_dir": "./result/",
"num_train_epochs": 3,
"per_device_train_batch_size": 1,
"gradient_accumulation_steps": 8,
"max_length": 512,
"save_steps": 1000,
"logging_steps": 1,
"learning_rate": 2e-5,
"weight_decay": 0.05,
"warmup_ratio": 0.1,
"lr_scheduler_type": "cosine"
}
}DistilQwen Model Family
Built on EasyDistill, the DistilQwen series are distilled versions of the Qwen family. They retain high performance while drastically reducing parameter count, making them suitable for edge devices.
System 1 Models
System 1 models generate concise answers using intuition‑type reasoning. DistilQwen‑2 and DistilQwen‑2.5 are enhanced instruction‑following models trained with SFT followed by DPO for preference alignment.
System 2 Models
System 2 models adopt a slow‑thinking approach, first producing a chain‑of‑thought and then the final answer, improving deep reasoning. DistilQwen‑2.5‑R1 uses DeepSeek‑R1 as a teacher, and CogPO refines the generated CoT.
Variable‑Length Chain‑of‑Thought Model
DistilQwen‑ThoughtX introduces adaptive, variable‑length CoT generation based on task difficulty. It is trained on the OmniThought dataset (2 million CoT examples) annotated with Reasoning Verbosity (RV) and Cognitive Difficulty (CD) scores, achieving superior reasoning performance compared to closed‑source baselines.
Open Datasets
DistilQwen_100K – instruction‑following – 100 k samples – https://huggingface.co/datasets/alibaba-pai/DistilQwen_100k
DistilQwen_1M – instruction‑following – 1 M samples – https://huggingface.co/datasets/alibaba-pai/DistilQwen_1M
OmniThought – chain‑of‑thought reasoning – 2 M samples – https://huggingface.co/datasets/alibaba-pai/OmniThought
Conclusion
EasyDistill lowers the barrier to LLM knowledge distillation by providing a unified, extensible toolkit that supports data synthesis, black‑box and white‑box training, and advanced RL‑based alignment. The released DistilQwen models and associated datasets demonstrate that compact models can achieve competitive or superior performance, especially in reasoning‑heavy scenarios.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
