Agent Skills Survey: How Process Knowledge Becomes Technical Debt
The recent arXiv survey on Agent Skills maps the full lifecycle of skills—representation, acquisition, retrieval, and evolution—and warns that unchecked growth can turn a valuable process asset into technical debt, urging teams to enforce admission quality, robust routing, versioning, testing, and retirement mechanisms.
Conclusion
The survey’s most valuable contribution is embedding Skills into a complete lifecycle: how to write, acquire, select, modify, and retire them.
Tools answer "can we do it?"; Skills answer "how should we do it?". Governance must cover admission, testing, versioning, permissions, and rollback.
Skills can originate from expert hand‑writing, execution traces, on‑the‑fly generation, or external corpora; the more sources, the stricter the admission gate should be.
As the Skill library grows, the problem shifts from "does it exist?" to "is it safe to use now?".
Adding Skills is easy; verifying, deprecating, merging, and preventing malicious third‑party Skills is hard.
In prior Goal, Memory, Harness, and Claude Code work, Skills represent the "how to do it next time" layer.
First Half and Second Half
Reading the paper purely as a literature review yields four keywords: represent, acquire, retrieve, evolve . The author argues that the engineering‑focused question—how to embed a model’s larger actions into an observable, verifiable, rollback‑able workflow—is more relevant.
Initially, MCP (Model‑Connect‑Protocol) solves "what can be connected" while Skills solve "how to accomplish the task after connection". Later, the discussion expands to containers such as SKILL.md, description, references/, assets/, and scripts/, which move repetitive tasks from chat prompts to the file system.
Where Skills Belong
The author places Skills in the middle layer of the Agent stack: the process‑asset layer where team experience becomes runtime knowledge. When the library expands from a few to hundreds of Skills, the question changes from "can we write it?" to "can we still trust it?".
Putting Skills Back in Their Proper Place
Many first‑time readers treat a Skill as a "high‑level prompt". The paper clarifies that a Skill is a reusable procedural asset that tells the Agent not only *what* to do but also *when* to do it, *which steps* to follow, *what resources* to use, *which failure modes* to avoid, and *how* to judge completion.
The specification abstracts a Skill as a triple: S = (M, R, C) where M is the main instruction document (e.g., SKILL.md), R are auxiliary resources (templates, scripts, reference docs), and C is the trigger condition that decides when the Skill should be loaded.
A concrete Skill looks like a small workspace: an entry file, optional scripts/, references/, and assets/. The description field must state the purpose, pre‑conditions, and usage boundaries—what the author calls "progressive disclosure".
First Debt: Skills Arrive Too Fast, Validation Lags
The paper categorises Skill acquisition into four sources: expert hand‑writing, experience extraction, on‑the‑fly generation, and external corpus mining. In practice:
Expert‑written Skills suit high‑value, high‑risk, well‑bounded processes (security reviews, releases, DB migrations).
Experience extraction turns execution traces into reusable patterns (e.g., Voyager, Reflexion, ExpeL, Memento‑Skills).
On‑the‑fly generation enables cold‑start tasks but can flood the library with low‑quality, unverified Skills.
External corpus mining repackages existing docs, code, APIs, and runbooks into Skills, addressing the common enterprise problem of scattered knowledge.
The survey highlights the notion of "admission quality": if Skill generation outpaces verification, low‑quality Skills accumulate, increasing retrieval noise and making trustworthy Skills harder to find.
Practical advice: start with high‑frequency, well‑bounded, verifiable processes (PR review, unit‑test fixes, release checks, incident post‑mortems, DB migration checks, internal SDK guidelines). Avoid broad, ill‑defined tasks like "help me design the whole architecture" as initial Skills.
Second Debt: Findable but Should Not Execute
When the library is small, simple name‑based lookup suffices. As it scales (SkillNet, SkillHub, SkillsMP, etc.), retrieval must be separated from selection:
Retrieval methods include vector search, keyword match, generative ID generation, and structured graph queries.
Selection must consider current state, pre‑conditions, tool permissions, cost, latency, historical success rate, composability, and fallback paths.
Example: a "publish service" Skill may be semantically relevant, but if the current branch fails CI, approvals are missing, or the target environment is production, the Skill should not fire.
Third Debt: Adding Without Decommission
Skills are attractive because they can evolve. However, evolution introduces risks:
Source traceability, reviewability, side‑effect isolation, test‑pass gating, version rollback, and retirement of stale or repeatedly failing Skills are essential.
Poisoned Skills—malicious third‑party assets that embed harmful logic—are a concrete threat mentioned in the paper.
Concrete workflow (Memento‑Skills example): read the Skill, execute the task, collect feedback, perform failure attribution, rewrite prompts or scripts, then gate entry with unit tests and rollback checks before acceptance.
Putting the Line Back with Goal, Memory, and Harness
The author maps four layers of the Agent stack:
Goal : what the Agent aims to achieve.
Memory : past events that should influence the present.
Skill : the procedural recipe for "how to do it next time".
Harness : how to make the above observable, testable, and controllable.
Industry voices (Tobi Lütke, Andrej Karpathy, Simon Willison, Addy Osmani, Martin Fowler) converge on the idea of "context engineering" and "agentic engineering patterns": the need for proper context injection, verification loops, and governance as models become more capable.
Practical Guidance to Avoid Debt
Do not launch a massive Skill platform immediately. Instead, focus on a small, high‑trust core and close the governance loop:
Admit only high‑frequency, well‑bounded, verifiable processes.
Write description as a routing contract that includes usage boundaries and disabling conditions.
Keep SKILL.md concise; place large resources in references/, templates in assets/, and deterministic actions in scripts/.
Provide verification hooks from the start (tests, lint checks, screenshot criteria).
Track version, author, dependencies, last successful run, and retirement schedule.
Record failures: last failure, failure type (trigger, execution, environment), and resulting action (downgrade, merge, deprecate).
These steps ensure the Skill library’s quality is measured by the team’s ability to recognise expired or unsafe assets, not by sheer quantity.
Final Thoughts
Agent Skills are not about giving Agents a few extra tricks; they are about turning team process knowledge into a trustworthy, searchable, executable, verifiable, and evolvable asset. Without admission gates, retrieval quality, testing, versioning, permission checks, and rollback, a Skill library quickly degrades into a noisy knowledge dump, much like an unmaintained Wiki.
The next frontier for Agent systems is not just stronger models, but more reliable process assets that can be safely selected and evolved over time.
References
A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications – https://arxiv.org/abs/2605.07358
Paper PDF – https://arxiv.org/pdf/2605.07358v1
Paper HTML – https://arxiv.org/html/2605.07358v1
Paper TeX Source – https://arxiv.org/e-print/2605.07358v1
Awesome‑Agent‑Skills – https://github.com/JayLZhou/Awesome-Agent-Skills
Agent Skills Specification – https://agentskills.io/specification
Anthropic Agent Skills – https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview
Tobi Lütke on context engineering – https://x.com/tobi/status/1935533422589399127
Andrej Karpathy on context engineering – https://x.com/karpathy/status/1937902205765607626
Simon Willison: Agentic Engineering Patterns – https://simonwillison.net/2026/Feb/23/agentic-engineering-patterns/
Addy Osmani: Agentic Engineering – https://addyosmani.com/blog/agentic-engineering/
Martin Fowler: Harness engineering for coding agents – https://martinfowler.com/articles/harness-engineering.html
LangChain: The rise of "context engineering" – https://www.langchain.com/blog/the-rise-of-context-engineering
Anthropic Engineering: Effective context engineering for AI agents – https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
