How EasyDistill Simplifies LLM Knowledge Distillation for Faster, Smaller Models

EasyDistill, an open‑source toolkit from Alibaba Cloud AI Platform, streamlines knowledge distillation of large language models by offering modular data synthesis, black‑box and white‑box training, reinforcement‑learning and preference‑optimization techniques, enabling the creation of compact, high‑performance DistilQwen models and accompanying datasets.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
How EasyDistill Simplifies LLM Knowledge Distillation for Faster, Smaller Models

EasyDistill Overview

Knowledge distillation transfers the knowledge of a large model to a smaller one without significantly degrading performance, reducing computational cost and enabling deployment in resource‑constrained environments. Alibaba Cloud AI Platform (PAI) released the open‑source EasyDistill toolkit to simplify this process for large language models.

Framework Features

EasyDistill provides modular components for data synthesis, basic and advanced distillation, and reinforcement‑learning based preference optimization. It supports both black‑box (API‑only) and white‑box (access to hidden states) training strategies, multiple loss functions (including KL‑divergence), and various RL algorithms such as PPO and Group Relative Policy Optimization.

EasyDistill framework architecture
EasyDistill framework architecture

Data Synthesis

EasyDistill integrates several data synthesis and augmentation operations that use proprietary and open‑source teacher models to expand instruction datasets and generate diverse chain‑of‑thought (CoT) data. Operations include instruction expansion, instruction optimization, and automatic generation of instruction‑response pairs, as well as CoT simplification and expansion operators to improve reasoning efficiency.

Basic Distillation Models

For closed‑source models, EasyDistill offers black‑box distillation based on supervised fine‑tuning (SFT), treating teacher outputs as ground truth. It supports any OpenAI‑compatible API (e.g., OpenAI, DashScope, PAI‑EAS). For open‑source teacher models, a white‑box strategy extracts token‑level logits and minimizes the divergence between teacher and student logits, optionally using only the top‑k logits to reduce computation.

Advanced Distillation Training

To avoid over‑fitting to teacher outputs, EasyDistill incorporates reinforcement learning (RL) and preference optimization. It can train a reward model from teacher‑generated preference data (RLAIF) and apply PPO or Group Relative Policy Optimization. Direct Preference Optimization (DPO) and the proposed Cognitive Preference Optimization (CogPO) further align student models with teacher cognition.

Getting Started

EasyDistill is modular; users select the needed modules and run them via a simple CLI.

git clone https://github.com/modelscope/easydistill
cd EasyDistill
python setup.py install
easydistill --config <config-file-path>

A sample black‑box configuration (JSON) is shown below:

{
  "job_type": "kd_black_box_local",
  "dataset": {
    "instruction_path": "train.json",
    "labeled_path": "train_labeled.json",
    "template": "chat_template/chat_template_kd.jinja",
    "seed": 42
  },
  "inference": {
    "enable_chunked_prefill": true,
    "seed": 777,
    "gpu_memory_utilization": 0.9,
    "temperature": 0.8,
    "trust_remote_code": true,
    "enforce_eager": false,
    "max_model_len": 4096,
    "max_new_tokens": 512
  },
  "models": {
    "teacher": "teacher/Qwen/Qwen2.5-7B-Instruct/",
    "student": "student/Qwen/Qwen2.5-0.5B-Instruct/"
  },
  "training": {
    "output_dir": "./result/",
    "num_train_epochs": 3,
    "per_device_train_batch_size": 1,
    "gradient_accumulation_steps": 8,
    "max_length": 512,
    "save_steps": 1000,
    "logging_steps": 1,
    "learning_rate": 2e-5,
    "weight_decay": 0.05,
    "warmup_ratio": 0.1,
    "lr_scheduler_type": "cosine"
  }
}

For closed‑source APIs, only the base URL and API key need to be provided.

{
  "job_type": "kd_black_box_api",
  "dataset": {
    "instruction_path": "train.json",
    "labeled_path": "train_labeled.json",
    "template": "./chat_template/chat_template_kd.jinja",
    "seed": 42
  },
  "inference": {
    "base_url": "ENDPOINT",
    "api_key": "TOKEN",
    "stream": true,
    "system_prompt": "You are a helpful assistant.",
    "max_new_tokens": 512
  },
  "models": {
    "student": "student/Qwen/Qwen2.5-0.5B-Instruct/"
  },
  "training": {
    "output_dir": "./result/",
    "num_train_epochs": 3,
    "per_device_train_batch_size": 1,
    "gradient_accumulation_steps": 8,
    "max_length": 512,
    "save_steps": 1000,
    "logging_steps": 1,
    "learning_rate": 2e-5,
    "weight_decay": 0.05,
    "warmup_ratio": 0.1,
    "lr_scheduler_type": "cosine"
  }
}

DistilQwen Model Family

Built on EasyDistill, the DistilQwen series are distilled versions of the Qwen family. They retain high performance while drastically reducing parameter count, making them suitable for edge devices.

System 1 Models

System 1 models generate concise answers using intuition‑type reasoning. DistilQwen‑2 and DistilQwen‑2.5 are enhanced instruction‑following models trained with SFT followed by DPO for preference alignment.

System 2 Models

System 2 models adopt a slow‑thinking approach, first producing a chain‑of‑thought and then the final answer, improving deep reasoning. DistilQwen‑2.5‑R1 uses DeepSeek‑R1 as a teacher, and CogPO refines the generated CoT.

Variable‑Length Chain‑of‑Thought Model

DistilQwen‑ThoughtX introduces adaptive, variable‑length CoT generation based on task difficulty. It is trained on the OmniThought dataset (2 million CoT examples) annotated with Reasoning Verbosity (RV) and Cognitive Difficulty (CD) scores, achieving superior reasoning performance compared to closed‑source baselines.

Open Datasets

DistilQwen_100K – instruction‑following – 100 k samples – https://huggingface.co/datasets/alibaba-pai/DistilQwen_100k

DistilQwen_1M – instruction‑following – 1 M samples – https://huggingface.co/datasets/alibaba-pai/DistilQwen_1M

OmniThought – chain‑of‑thought reasoning – 2 M samples – https://huggingface.co/datasets/alibaba-pai/OmniThought

Conclusion

EasyDistill lowers the barrier to LLM knowledge distillation by providing a unified, extensible toolkit that supports data synthesis, black‑box and white‑box training, and advanced RL‑based alignment. The released DistilQwen models and associated datasets demonstrate that compact models can achieve competitive or superior performance, especially in reasoning‑heavy scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

model compressionKnowledge DistillationEasyDistillDistilQwen
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.