SKILLRL: Boosting LLM Agents with Skill Distillation and Recursive Evolution
SKILLRL introduces a novel framework that transforms raw LLM agent trajectories into compact, reusable skills via experience‑driven distillation, hierarchical skill banks, and recursive skill evolution, achieving up to 90% success on ALFWorld and 73% on WebShop while reducing token usage by over 10% compared to memory‑based baselines.
SKILLRL Framework Overview
SKILLRL bridges raw experience and policy improvement through three core components, turning isolated LLM agent executions into a reusable skill library that can be abstracted, retrieved, and evolved.
2.1 Three Core Components
Experience‑Driven Skill Distillation : Converts diverse trajectories into structured skills, preserving successful paths as demonstrations and synthesizing failed attempts into concise counterfactual lessons.
Hierarchical Skill Bank (SKILLBANK) : Organises skills into a two‑level hierarchy—generic skills that capture cross‑task strategies and task‑specific skills that encode domain‑specific actions, preconditions, and common failure patterns.
Recursive Skill Evolution Mechanism : Dynamically updates the skill bank during reinforcement‑learning (RL) training, enabling the skill set and policy to co‑evolve.
Core Technical Details
3.1 Experience‑Driven Skill Distillation
Unlike methods that only store raw trajectories, SKILLRL retains both successful and failed trajectories. Successful trajectories are distilled into strategic patterns, while failed trajectories are transformed into counterfactual knowledge that highlights error points, correct reasoning, and preventive principles.
Success Trajectories : Extract strategic patterns that lead to task completion.
Failure Trajectories : Synthesize concise lessons that identify failure points, incorrect reasoning, correct actions, and prevention rules.
Key Advantage : 10‑20× token compression while enhancing reasoning utility.
3.2 Hierarchical Skill Bank (SKILLBANK)
The skill bank consists of two layers:
Generic Skills : Capture universal strategic principles such as systematic search patterns, state‑validation rules, and progress‑tracking heuristics.
Task‑Specific Skills : Encode domain‑specific action sequences, preconditions, constraints, and typical failure modes for each task type.
Retrieval strategy: generic skills provide a base guide, while task‑specific skills are fetched dynamically via semantic similarity.
3.3 Recursive Skill Evolution
The skill bank is not static; it evolves through a closed‑loop process:
After each validation cycle, monitor success rates for each skill category.
Collect failure trajectories for categories whose success rate falls below a threshold.
A teacher model analyses these failures to identify gaps not covered by existing skills.
Generate new skills or optimise existing ones, then update SKILLBANK.
Result : A virtuous cycle—agent improvement → new challenges → skill‑bank expansion → further improvement.
Experimental Results
4.1 Main Experiments: ALFWorld and WebShop
SKILLRL achieves 89.9% success on ALFWorld and 72.7% on WebShop, substantially outperforming the best prompt‑based baselines.
Significant over prompt baselines : 89.9% vs. best baseline on ALFWorld.
Improvement over pure RL : +12.3% on ALFWorld, >20% on complex sub‑tasks.
Outperforms memory‑enhanced RL : +35.2% over Mem0+GRPO.
Surpasses closed‑source models : Qwen2.5‑7B‑based SKILLRL exceeds GPT‑4o by 41.9% and Gemini‑2.5‑Pro by 29.6%.
4.2 Search‑Enhanced QA
On seven search‑enhanced QA benchmarks, SKILLRL attains an average score of 47.1%, beating Search‑R1 (38.5%) and EvolveR (43.1%). Notably, on the multi‑hop reasoning task Bamboogle, it outperforms EvolveR by 19.4%.
4.3 Ablation Studies
Ablation experiments confirm that skill distillation, hierarchical skill banks, and recursive evolution each contribute to the overall performance gains.
In‑Depth Analysis
5.1 Skill‑Bank Evolution
Initial state : 55 skills (12 generic + 43 task‑specific).
After training : 100 skills (20 generic + 80 task‑specific).
Growth pattern : Task‑specific skills increase markedly, while generic skills grow steadily, ensuring expertise for each task category.
5.2 Context Efficiency
Compared with raw‑memory methods (average ~1,450 tokens), SKILLRL maintains an average prompt length of <1,300 tokens—a 10.3% reduction, demonstrating effective mitigation of context bloat.
5.3 Convergence Speed
SKILLRL reaches >80% success within 60 training steps.
Baseline without skill evolution requires ~90 steps to achieve a lower peak.
5.4 Qualitative Cases
WebShop : Retrieves and applies generic skill "Prioritize Core Keywords" and task‑specific skill "Focus Key Query".
ALFWorld : Coordinates hierarchical skills such as "Progressive Goal Decomposition" and "No Appliance Before Object" to avoid logical traps.
https://arxiv.org/pdf/2602.08234
SkillRL: Evolving Agents via Recursive Skill‑Augmented Reinforcement Learning
https://github.com/aiming-lab/SkillRLHow this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
