Artificial Intelligence 14 min read

How to Supercharge Small LLM Agents with ReAct Data Construction and EasyDistill

This guide explains how to build high‑quality agent training data using ReAct trajectories, synthesize difficult samples with a data‑flywheel, and distill the knowledge into small LLMs on Alibaba Cloud PAI, covering teacher model deployment, EasyDistill installation, data generation, task solving, rubric filtering, and final model deployment.

Alibaba Cloud Big Data AI Platform

Apr 10, 2026

How to Supercharge Small LLM Agents with ReAct Data Construction and EasyDistill

Large language models (LLMs) are increasingly used as agents for multi‑turn reasoning, tool usage, and self‑repair. To enable small models to acquire comparable agent capabilities, the PAI platform provides a ReAct‑based data construction and model distillation workflow using the open‑source EasyDistill toolkit.

Prerequisites

PAI services (DSW, DLC, EAS) must be activated and a default workspace created.

An OSS bucket is required for storing intermediate data and model artifacts.

Core Steps

Deploy Teacher Model : Select a high‑capacity LLM (parameter count ≥100B) from the PAI‑Model Gallery, such as DeepSeek‑V3.2 or GLM‑5, and deploy it. The platform will provide an inference endpoint (API URL) that will be used in subsequent steps.

Install EasyDistill : Clone the EasyDistill repository and install its Python dependencies.

git clone https://github.com/modelscope/easydistill

Generate Agent Training Data :

Prepare a persona seed file in .jsonl format, e.g.:

{"id": "uuid1", "persona": "A passionate fan of Afrikaans music and die‑hard supporter of Spoegwolf"}
{"id": "uuid2", "persona": "An AI research scientist focused on natural language understanding."}

Create a configuration file configs/agentkd_data_gen.json that specifies: paths.data_file: path to the persona seed file. step_models: three agents (ToolSetGenAgent, PolicyTaskAgent, FinalTaskAgent) that call the teacher model (e.g., model_name": "deepseek-v3.2") with appropriate max_tokens and temperature. processing.max_workers and processing.max_tasks to control parallelism and total number of generated tasks. api_configs: API base URL and environment variable for the teacher model’s key. logging.task_file_path: output file for generated virtual tasks (e.g., data/virtual_tool_use_tasks.jsonl).

Run the data‑generation job:

python easydistill/agentkd/data_gen.py --config configs/agentkd_data_gen.json
# or
 easydistill --config configs/agentkd_data_gen.json

Solve Generated Tasks :

Configure configs/agentkd_solve_task.json to let the teacher model solve each virtual task. Key fields include step_models.SolveAgent and step_models.MockToolAgent, the same api_configs as before, and paths.data_file pointing to data/virtual_tool_use_tasks.jsonl.

Execute the solving pipeline:

python easydistill/agentkd/solve_task.py --config configs/agentkd_solve_task.json
# or
 easydistill --config configs/agentkd_solve_task.json

Results are written to logs/solve_output/ as JSON files containing full reasoning traces, tool calls, and observations.

Rubrics Evaluation & Filtering :

Define configs/agentkd_rubrics_filter.json to score each solved trajectory with a RubricsAgent (e.g., model_name": "deepseek-v3.2", low temperature,

solution_top_k": 3).</li>
      <li>Set <code>paths.solution_path

to the directory containing solved outputs and dataset.labeled_path to the desired filtered output file (e.g., data/filtered_train.jsonl).

Run the filter:

python easydistill/agentkd/rubrics.py --config configs/agentkd_rubrics_filter.json
# or
 easydistill --config configs/agentkd_rubrics_filter.json

The process retains only PASS‑rated trajectories, producing a clean training set.

Model Distillation (Student Training) :

Create configs/agentkd_distill.json with the following sections:

{
  "job_type": "agentkd_distill",
  "dataset": {"labeled_path": "data/tool_use_training_data.json"},
  "models": {"student": "Qwen/Qwen2.5-7B-Instruct", "trust_remote_code": true},
  "training": {
    "output": {"output_dir": "output/tool_use_sft", "logging_steps": 10, "save_steps": 500, "overwrite_output_dir": true},
    "dataset": {"cutoff_len": 8192, "dataloader_num_workers": 4},
    "num_train_epochs": 3,
    "per_device_train_batch_size": 1,
    "gradient_accumulation_steps": 8,
    "gradient_checkpointing": true,
    "bf16": true,
    "learning_rate": 1e-5,
    "warmup_ratio": 0.1
  }
}

Launch the distillation job:

python easydistill/agentkd/train.py --config configs/agentkd_distill.json
# or
 easydistill --config configs/agentkd_distill.json

The student model learns the full ReAct loop (thought → action → observation) while being much smaller and cheaper to run.

Online Deployment : Deploy the distilled model with PAI‑EAS, which handles environment setup, performance tuning, and cost management. Reference deployment guide: https://help.aliyun.com/zh/pai/user-guide/deploy-an-llm/

Data Generation LLM ReAct Agent model distillation PAI EasyDistill

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.