SKILLRL: Boosting LLM Agents with Skill Distillation and Recursive Evolution

SKILLRL introduces a novel framework that transforms raw LLM agent trajectories into compact, reusable skills via experience‑driven distillation, hierarchical skill banks, and recursive skill evolution, achieving up to 90% success on ALFWorld and 73% on WebShop while reducing token usage by over 10% compared to memory‑based baselines.

PaperAgent
PaperAgent
PaperAgent
SKILLRL: Boosting LLM Agents with Skill Distillation and Recursive Evolution

SKILLRL Framework Overview

SKILLRL bridges raw experience and policy improvement through three core components, turning isolated LLM agent executions into a reusable skill library that can be abstracted, retrieved, and evolved.

2.1 Three Core Components

Experience‑Driven Skill Distillation : Converts diverse trajectories into structured skills, preserving successful paths as demonstrations and synthesizing failed attempts into concise counterfactual lessons.

Hierarchical Skill Bank (SKILLBANK) : Organises skills into a two‑level hierarchy—generic skills that capture cross‑task strategies and task‑specific skills that encode domain‑specific actions, preconditions, and common failure patterns.

Recursive Skill Evolution Mechanism : Dynamically updates the skill bank during reinforcement‑learning (RL) training, enabling the skill set and policy to co‑evolve.

Figure 1: SKILLRL overall architecture and performance comparison
Figure 1: SKILLRL overall architecture and performance comparison

Core Technical Details

3.1 Experience‑Driven Skill Distillation

Unlike methods that only store raw trajectories, SKILLRL retains both successful and failed trajectories. Successful trajectories are distilled into strategic patterns, while failed trajectories are transformed into counterfactual knowledge that highlights error points, correct reasoning, and preventive principles.

Success Trajectories : Extract strategic patterns that lead to task completion.

Failure Trajectories : Synthesize concise lessons that identify failure points, incorrect reasoning, correct actions, and prevention rules.

Key Advantage : 10‑20× token compression while enhancing reasoning utility.

3.2 Hierarchical Skill Bank (SKILLBANK)

The skill bank consists of two layers:

Generic Skills : Capture universal strategic principles such as systematic search patterns, state‑validation rules, and progress‑tracking heuristics.

Task‑Specific Skills : Encode domain‑specific action sequences, preconditions, constraints, and typical failure modes for each task type.

Retrieval strategy: generic skills provide a base guide, while task‑specific skills are fetched dynamically via semantic similarity.

3.3 Recursive Skill Evolution

The skill bank is not static; it evolves through a closed‑loop process:

After each validation cycle, monitor success rates for each skill category.

Collect failure trajectories for categories whose success rate falls below a threshold.

A teacher model analyses these failures to identify gaps not covered by existing skills.

Generate new skills or optimise existing ones, then update SKILLBANK.

Result : A virtuous cycle—agent improvement → new challenges → skill‑bank expansion → further improvement.

Experimental Results

4.1 Main Experiments: ALFWorld and WebShop

SKILLRL achieves 89.9% success on ALFWorld and 72.7% on WebShop, substantially outperforming the best prompt‑based baselines.

Significant over prompt baselines : 89.9% vs. best baseline on ALFWorld.

Improvement over pure RL : +12.3% on ALFWorld, >20% on complex sub‑tasks.

Outperforms memory‑enhanced RL : +35.2% over Mem0+GRPO.

Surpasses closed‑source models : Qwen2.5‑7B‑based SKILLRL exceeds GPT‑4o by 41.9% and Gemini‑2.5‑Pro by 29.6%.

4.2 Search‑Enhanced QA

On seven search‑enhanced QA benchmarks, SKILLRL attains an average score of 47.1%, beating Search‑R1 (38.5%) and EvolveR (43.1%). Notably, on the multi‑hop reasoning task Bamboogle, it outperforms EvolveR by 19.4%.

4.3 Ablation Studies

Ablation experiments confirm that skill distillation, hierarchical skill banks, and recursive evolution each contribute to the overall performance gains.

In‑Depth Analysis

5.1 Skill‑Bank Evolution

Initial state : 55 skills (12 generic + 43 task‑specific).

After training : 100 skills (20 generic + 80 task‑specific).

Growth pattern : Task‑specific skills increase markedly, while generic skills grow steadily, ensuring expertise for each task category.

5.2 Context Efficiency

Compared with raw‑memory methods (average ~1,450 tokens), SKILLRL maintains an average prompt length of <1,300 tokens—a 10.3% reduction, demonstrating effective mitigation of context bloat.

5.3 Convergence Speed

SKILLRL reaches >80% success within 60 training steps.

Baseline without skill evolution requires ~90 steps to achieve a lower peak.

5.4 Qualitative Cases

WebShop : Retrieves and applies generic skill "Prioritize Core Keywords" and task‑specific skill "Focus Key Query".

ALFWorld : Coordinates hierarchical skills such as "Progressive Goal Decomposition" and "No Appliance Before Object" to avoid logical traps.

https://arxiv.org/pdf/2602.08234
SkillRL: Evolving Agents via Recursive Skill‑Augmented Reinforcement Learning
https://github.com/aiming-lab/SkillRL
Figure 2: Detailed SKILLRL workflow
Figure 2: Detailed SKILLRL workflow
Figure 3: Skill‑Bank evolution during RL training
Figure 3: Skill‑Bank evolution during RL training
Figure 4: Prompt length comparison
Figure 4: Prompt length comparison
Figure 5: Training curves with and without skill evolution
Figure 5: Training curves with and without skill evolution
Figure 6: WebShop and ALFWorld case studies
Figure 6: WebShop and ALFWorld case studies
reinforcement learningLLM agentsskill distillationhierarchical skill bankSKILLRL
PaperAgent
Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.