Artificial Intelligence 14 min read

How to Distill Large Language Models for Efficient Text Generation with EasyDistill

This guide explains how to use the EasyDistill framework and Alibaba Cloud PAI to distill large language models for high‑quality text generation, covering model deployment, SFT and DPO training data construction, code examples, configuration files, and best practices for achieving resource‑efficient, high‑performance student models.

Alibaba Cloud Big Data AI Platform

Jul 23, 2025

How to Distill Large Language Models for Efficient Text Generation with EasyDistill

Background

Large language models excel at generating high‑quality copy, but their massive compute and storage requirements make deployment in resource‑constrained environments challenging. Model distillation transfers knowledge from a large model to a smaller, more efficient one, preserving most performance while drastically reducing resource consumption.

Deploy Teacher Large Language Model

In the PAI Model Gallery, select a teacher model such as DeepSeek‑V3. The platform provides default service and resource configurations, which can be adjusted before clicking the Deploy button.

Model Deployment and Invocation

PAI supports deployment via SGLang, vLLM, or Transformers. After deployment, the model can be accessed through an OpenAI‑compatible API.

from openai import OpenAI

# API configuration
openai_api_key = "<EAS API KEY>"
openai_api_base = "<EAS API Endpoint>/v1"

client = OpenAI(api_key=openai_api_key, base_url=openai_api_base)
models = client.models.list()
model = models.data[0].id
print(model)

def main():
    stream = True
    chat_completion = client.chat.completions.create(
        messages=[
            {"role": "user", "content": [{"type": "text", "text": "你好，介绍一下你自己，越详细越好。"}]}
        ],
        model=model,
        max_completion_tokens=1024,
        stream=stream,
    )
    if stream:
        for chunk in chat_completion:
            print(chunk.choices[0].delta.content, end="")
    else:
        result = chat_completion.choices[0].message.content
        print(result)

if __name__ == "__main__":
    main()

Build SFT Training Data

Create a JSON list where each item contains an instruction field (the prompt) and later an output field (the model response). Example input format:

[
  {"instruction": "xxx"},
  {"instruction": "xxx"},
  {"instruction": "xxx"}
]

Use a task template such as:

你是短视频文案生成专家，专注于根据视频原始标题、视频内容，生成文案的标题和内容。
你的任务是确保文案与视频核心内容高度匹配，并且吸引用户点击。
要求
1: 信息匹配度：确保文案准确反映视频核心看点，禁止出现视频中未呈现的虚构内容。
2. 情绪契合度：文案情绪需与视频内容保持一致。严肃悲伤类内容不要使用搞笑戏谑风格。
3. 内容规范度：确保句意表达清晰、完整、通顺、连贯，没有出现无意义字符。
4. 严格按照JSON格式输出：
{
   "title": "",
   "body": ""
}

Batch‑invoke the teacher model to generate outputs, then save the results:

import json
from openai import OpenAI

# API configuration
openai_api_key = "<EAS API KEY>"
openai_api_base = "<EAS API Endpoint>/v1"

client = OpenAI(api_key=openai_api_key, base_url=openai_api_base)

def read_input_data(file_path):
    with open(file_path, 'r', encoding='utf-8') as file:
        return json.load(file)

def get_model_output(instruction):
    chat_completion = client.chat.completions.create(
        messages=[{"role": "user", "content": [{"type": "text", "text": instruction}]}],
        model=model,
        max_completion_tokens=1024,
        stream=False,
    )
    return chat_completion.choices[0].message.content

def process_data(input_data):
    results = []
    for item in input_data:
        instruction = item.get("instruction")
        output = get_model_output(instruction)
        results.append({"instruction": instruction, "output": output})
    return results

def save_output_data(file_path, data):
    with open(file_path, 'w', encoding='utf-8') as file:
        json.dump(data, file, ensure_ascii=False, indent=2)

def main(input_file_path, output_file_path):
    input_data = read_input_data(input_file_path)
    output_data = process_data(input_data)
    save_output_data(output_file_path, output_data)
    print("Data processing complete.")

if __name__ == "__main__":
    input_file_path = "input.json"
    output_file_path = "output.json"
    main(input_file_path, output_file_path)

Build DPO Training Data

Using the SFT data, generate low‑quality (rejected) responses with a different task template, then combine prompt, chosen (high‑quality) and rejected (low‑quality) fields:

[
  {"prompt": "xxx", "chosen": "xxx", "rejected": "xxx"},
  {"prompt": "xxx", "chosen": "xxx", "rejected": "xxx"},
  {"prompt": "xxx", "chosen": "xxx", "rejected": "xxx"}
]

Example rejected‑generation script (similar to the SFT script but with a negative task template) is provided in the source.

Train Student Model with SFT

Run the EasyDistill SFT training command:

python easydistill/kd/train.py --config=sft.json

Example sft.json configuration:

{
  "job_type": "kd_black_box_api",
  "dataset": {
    "labeled_path": "sft_train.json",
    "template": "chat_template_kd.jinja",
    "seed": 42
  },
  "models": {
    "student": "model/Qwen/Qwen2.5-0.5B-Instruct/"
  },
  "training": {
    "output_dir": "result_sft/",
    "num_train_epochs": 3,
    "per_device_train_batch_size": 1,
    "gradient_accumulation_steps": 8,
    "save_steps": 1000,
    "logging_steps": 1,
    "learning_rate": 2e-5,
    "weight_decay": 0.05,
    "warmup_ratio": 0.1,
    "lr_scheduler_type": "cosine"
  }
}

Further Optimize Student Model with DPO

After SFT, run the DPO training command:

python easydistill/rank/train.py --config=dpo.json

Example dpo.json configuration:

{
  "job_type": "rank_dpo_api",
  "dataset": {
    "labeled_path": "dpo_train.json",
    "template": "chat_template_kd.jinja",
    "seed": 42
  },
  "models": {
    "student": "result_sft/"
  },
  "training": {
    "output_dir": "result_dpo/",
    "num_train_epochs": 3,
    "per_device_train_batch_size": 1,
    "gradient_accumulation_steps": 8,
    "save_steps": 1000,
    "logging_steps": 1,
    "beta": 0.1,
    "learning_rate": 2e-5,
    "weight_decay": 0.05,
    "warmup_ratio": 0.1,
    "lr_scheduler_type": "cosine"
  }
}

These steps enable efficient distillation of large language models into smaller student models while maintaining generation quality.

large language models SFT model distillation DPO PAI EasyDistill

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.