Artificial Intelligence 12 min read

How EasyDistill Cuts LLM Costs: Mastering DistilQwen-ThoughtX on Alibaba Cloud

EasyDistill, an open-source framework from Alibaba Cloud PAI, streamlines knowledge distillation for large language models, introducing the DistilQwen-ThoughtX series with variable-length chain-of-thought reasoning, and provides comprehensive best-practice guidance for training, fine-tuning, evaluation, compression, and deployment via the PAI-ModelGallery.

Alibaba Cloud Big Data AI Platform

Jun 13, 2025

How EasyDistill Cuts LLM Costs: Mastering DistilQwen-ThoughtX on Alibaba Cloud

As large language models (LLMs) grow in size and computational demand, efficiently applying them becomes a critical challenge. Alibaba Cloud PAI released the open-source EasyDistill framework to simplify LLM knowledge distillation, dramatically reducing compute cost while preserving performance.

DistilQwen-ThoughtX Series

The DistilQwen-ThoughtX models incorporate innovative variable-length chain-of-thought (CoT) reasoning that adapts the number of inference steps to task difficulty, avoiding the overthinking problem of traditional CoT methods. Built on the OmniThought dataset containing 2 million annotated reasoning chains, the models also use inference redundancy (RV) and cognitive difficulty (CD) optimizations. DistilQwen-ThoughtX-32B achieves state‑of‑the‑art results on complex reasoning benchmarks, even surpassing models trained on proprietary datasets.

PAI‑ModelGallery Support

PAI‑ModelGallery is a one‑stop AI platform that integrates high‑quality pre‑trained models from global open‑source communities. It enables zero‑code or SDK‑based training, evaluation, compression, and rapid deployment of models, greatly simplifying the AI development workflow for developers and enterprises.

Environment Requirements

Supported regions: Beijing, Shanghai, Shenzhen, Hangzhou, Ulanqab, Singapore, etc.

Resource configuration:

Training DistilQwen-ThoughtX‑7B requires at least an A10 GPU (24 GB VRAM) or higher.

Training DistilQwen-ThoughtX‑32B requires at least a GU108 GPU or higher.

Deployment of the 7B model works on a single P100, T4, or V100; recommended GU30 or A10.

Deployment of the 32B model needs dual GU60 or four A10 GPUs; recommended four‑card GU60 or 8‑card V100‑32G.

Using the Model

Log in to the PAI console, navigate to Quick Start → Model Gallery (https://x.sm.cn/CZTmz7b), select a DistilQwen-ThoughtX model card (e.g., DistilQwen-ThoughtX‑7B), and follow the UI to start training, evaluation, or deployment.

Model Deployment

PAI provides pre‑configured deployment options for DistilQwen-ThoughtX‑7B, including SGLang, Blade LLM, VLLM, and Transformers. Deployment is zero‑code: click “Deploy” to launch the model on the PAI‑EAS inference service. The Transformers deployment supports a ChatLLM WebUI for real‑time interaction, and the service is also compatible with OpenAI‑style API calls.

Model Fine‑Tuning

PAI offers two fine‑tuning algorithms for DistilQwen-ThoughtX‑7B:

SFT (Supervised Fine‑Tuning) – accepts JSON lines with instruction and output fields.

DPO (Direct Preference Optimization) – accepts JSON lines with prompt, chosen, and rejected fields.

[
  {
    "instruction": "You are a cardiologist...",
    "output": "...advice..."
  },
  {
    "instruction": "You are a pulmonologist...",
    "output": "...advice..."
  }
]

[
  {
    "prompt": "Could you please hurt me?",
    "chosen": "Sorry, I can't do that.",
    "rejected": "I cannot hurt you..."
  },
  {
    "prompt": "That guy stole my tool...",
    "chosen": "You shouldn't have done that...",
    "rejected": "That's understandable..."
  }
]

Upload prepared data to an OSS bucket, ensure GPU resources (A10 with 24 GB VRAM) are available, and start the fine‑tuning job.

Model Evaluation

PAI provides evaluation algorithms for both custom and public datasets. Metrics include BLEU, ROUGE‑L, and an expert‑mode judge model. Users supply a JSONL file where each line contains question and answer fields.

Public benchmark suites supported: MMLU, TriviaQA, HellaSwag, GS, M8K, C‑Eval, TruthfulQA, with more being added.

Model Compression

After training, models can be quantized/compressed to reduce deployment resource usage. Create a compression task in the training UI, configure compression method and resources, then run the task. Once completed, deploy the compressed model with a single click.

Conclusion

The EasyDistill framework and DistilQwen-ThoughtX series demonstrate the huge potential of LLMs in inference scenarios. By combining fine-grained CoT data classification with black‑box knowledge distillation, DistilQwen-ThoughtX substantially improves reasoning performance while cutting compute costs. The Alibaba Cloud PAI platform offers end‑to‑end support, making model training, fine‑tuning, evaluation, compression, and deployment accessible to developers and enterprises.

Related Resources

EasyDistill framework: https://x.sm.cn/BJztNnv

DistilQwen-ThoughtX introduction: https://x.sm.cn/1GWgAXP

DistilQwen 2.5 introduction: https://x.sm.cn/DJDynIL

Distill DeepSeek‑R1 models: https://x.sm.cn/AnJ5Th9

LLM data augmentation & distillation solution: https://x.sm.cn/37nZqAr

PAI Model Gallery: https://x.sm.cn/GTvukU6

PAI Python SDK (GitHub): https://x.sm.cn/Bw0J5O3

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM AI inference knowledge distillation

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.