A First Systematic Survey of Agent Skills: Taxonomy, Techniques, and Applications
This survey analyzes the emerging field of Agent Skills, defining a formal skill model, categorizing acquisition pathways, detailing retrieval strategies, and outlining a five‑stage evolution process, while highlighting large‑scale skill repositories and their implications for AI product design.
What Is a Skill and How Is It Stored?
The paper defines a skill as a triple S = (M, R, C), where M is the main instruction document, R is auxiliary resources (templates, scripts, references), and C is the trigger condition. Skills are classified into three resource types: pure‑text (readable but less deterministic), pure‑code (executable but harder to maintain), and hybrid (combining readability and executability, yet most complex to keep consistent).
How Are Skills Acquired? Four Complementary Paths
1. Human‑written expert skills (precise but slow) – doctors write clinical protocols, engineers write troubleshooting manuals, policy experts write audit standards. These provide high‑precision seed skills.
2. Extraction from experience (most common) – after an agent completes a task, reusable patterns are distilled. Examples include Voyager storing successful Minecraft actions as code skills, Reflexion extracting correction rules from failures, and ExpeL compressing multiple successes and failures into high‑level lessons. The process involves filtering, abstraction, memory recomposition, and workflow packaging.
3. On‑the‑fly generation for new tasks – when facing an unseen requirement, the LLM generates a candidate skill, which is then kept, modified, or discarded based on execution results. Representative systems are CREATOR and ToolMakers.
4. Mining external resources – skills are harvested from documents, code repositories, Kaggle solutions, API docs, etc., which is especially useful for cold‑start scenarios.
Why More Skills Aren’t Enough: Retrieval and Selection Bottlenecks
When a skill library grows (SkillsMP reports over 700,000 skills), the core challenge shifts from “does the skill exist?” to “can the right skill be retrieved and activated at the right moment?”. The paper notes that retrieval recall does not equal execution success because a semantically relevant skill may be inapplicable in the current environment.
Four retrieval strategies are identified:
Semantic vector search : map task and skill descriptions into a shared embedding space and retrieve nearest neighbors; most common but semantic similarity ≠ applicability.
Keyword search : exact match on skill names or metadata; simple but unreliable, useful as a supplemental filter.
Generative retrieval : let the model directly generate a skill ID, integrating retrieval into reasoning; coverage and correctness are hard to guarantee.
Structured search : exploit hierarchical or dependency structures inside the skill store to narrow the search space; suited for large, well‑organized libraries.
Skill Evolution: Continuous Improvement Beyond Storage
The survey separates “skill acquisition” (first creation) from “skill evolution” (ongoing refinement). Evolution comprises five stages:
Revision – after a failure, the skill’s content is edited; Memento‑Skills rewrites instructions and uses unit tests to decide whether to keep the change.
Validation – modified skills must pass tests before entering the official library; SkillWeaver generates test cases for Web‑Agent APIs, while PSN introduces maturity thresholds and rollback verification.
Policy Coupling – the skill store becomes part of policy training; SkillRL jointly optimizes policies and the skill repository during reinforcement learning.
Repository Evolution – evolution expands from single skills to whole‑library governance; SkillClaw aggregates execution traces from multiple users, validates them, and synchronizes updates to a shared repository.
Runtime Governance – even executable skills may be unsafe; the paper warns about “poisoned skills” where third‑party skill documents hide malicious logic that agents might execute as trusted instructions.
Implications for Building AI Products
The survey argues that the next competitive edge for agents is not larger models but stronger skill‑management capabilities. Skills act as muscle memory; without them, even the smartest model cannot perform efficiently.
Lifecycle management—continuous retrieval, validation, evolution, and governance—is more critical than static skill storage. Product architectures therefore need not only a skill store but also retrieval engines, testing frameworks, version control, and security audits.
Large‑scale skill ecosystems are already emerging: SkillNet (>300,000 skills), ClawHub (>40,000), and SkillsMP (>700,000) indicate that “skills” are becoming an independent infrastructure layer rather than a peripheral feature of a single agent product.
Paper title: A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications
Paper link: https://arxiv.org/abs/2605.07358v1
GitHub: https://github.com/JayLZhou/Awesome-Agent-SkillsSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
