Agent Skills Review: How New AI Skills Are Redefining Large‑Model Operating Systems

The article surveys the rapid emergence of Agent Skills, outlines a six‑layer framework that defines their ontology, representation, lifecycle, runtime integration, governance, and applications, highlights severe security vulnerabilities revealed in large‑scale studies, and discusses the open research challenges ahead.

Linyb Geek Road
Linyb Geek Road
Linyb Geek Road
Agent Skills Review: How New AI Skills Are Redefining Large‑Model Operating Systems

Why discuss Skills? An industry pain point and turning point

In early 2026 a massive malicious Skill incident hit the OpenClaw market: over 30 disguised productivity Skills were uploaded to ClawHub, executing remote commands, stealing credentials, and tampering with dependency chains. ToxicSkills reported that 36% of Skills contained prompt‑injection flaws, detecting 1,467 malicious payloads, illustrating a wild‑growth market coupled with exploding security issues.

The authors argue that the second half of the Agent era will be decided not by which large model is stronger, but by whose Skill ecosystem is richer, safer, and more standardized.

Core Framework: Six‑Layer Architecture of Agent Skills

The reviewed paper proposes a comprehensive six‑layer analysis covering the full Skill lifecycle from definition to governance.

L1 Ontology – The “Philosophical Three Questions”

Skill is defined as a reusable unit of procedural knowledge, composed of four components: applicability conditions, core operations, resource interfaces, and validation criteria. It sits between episodic memory and abstract rules, enabling executable steps that can be reused across contexts.

L2 Representation – Five Distinct Skill Forms

Natural‑language form : simple textual advice, limited by model understanding.

Code‑snippet form : encapsulated functions (e.g., craftStonePickaxe()) that hide dozens of steps behind a single call.

Decision‑graph form : directed‑graph workflows representing SOPs with conditional branches.

File‑system form : a folder containing a SKILL.md manifest, scripts, and templates, loaded on demand.

Advanced design form : versioned, dependency‑aware packages (Skillsets) managed like software libraries.

L3 Lifecycle – How Skills “live”

The lifecycle is split into five stages: acquisition (extracting Skills from successful or failed task traces), storage (layered repositories, ability trees, dependency graphs), usage (progressive disclosure to respect context windows), maintenance (refinement, evolution, deprecation), and internalization (embedding frequently used Skills into model parameters to minimize behavior divergence).

L4 Runtime Integration – How Skills are executed

Terminal interface : Skills wrap command‑line sequences, providing boundaries and verification.

Tool interface : Skills act as protocol layers translating user intents (e.g., “book a flight”) into tool calls with error handling.

Multi‑Agent system integration : Skills become inter‑Agent contracts defining responsibilities, outputs, and handoffs.

Harness integration : Skills augment the runtime environment (file access, tool calls, memory management) turning a static container into a learning runtime.

L5 Governance – The “red‑light” zone of the Skill market

Security studies of 42,447 Skills found 26.1% contain at least one vulnerability (prompt‑injection, data leakage, privilege escalation, supply‑chain risk). Data leakage accounts for 13.3% and privilege escalation 11.8%. An independent analysis of 98,380 Skills identified 157 malicious Skills and 632 vulnerabilities, with 54.1% of attacks using template‑based brand impersonation.

Case study: the “ClawHavoc” supply‑chain poisoning on OpenClaw’s ClawHub market used a deceptive prerequisite note in SKILL.md to trick users into executing malicious commands, exemplifying “Agent‑driven social engineering”.

OWASP (2026) released the Agentic Skills Top 10 (AST10), recognizing Skills as a distinct and dangerous attack surface.

Proposed governance layers: admission control (trusted sources, manual audit), runtime guardrails (Guardrails to filter Skill I/O), and formal governance (explicit constraints, rules, and verification conditions).

L6 Applications – Five typical deployment scenarios

Robotics : Skills bridge semantic reasoning and low‑level controller actions.

Game environments : Skills compress sparse rewards into long‑term goals (e.g., Voyager in Minecraft).

Web Agents : Skills encapsulate browser interactions (clicks, typing) for better generalization.

GUI/OS Agents : Skills abstract heterogeneous pixel‑based interfaces, accessibility trees, and input events.

Software‑engineering Agents : Skills become atomic capabilities for code writing, bug locating, test generation, and review.

Challenges – Hard problems beneath the shiny surface

The paper lists five open research challenges:

Measuring true Skill value : beyond task success rate, need metrics for activation accuracy, execution fidelity, cross‑task transfer, and robustness.

Skill‑context zero‑sum game : more Skills increase retrieval noise and context consumption; adaptive Skill compression is proposed to dynamically decide what to expose.

Skill conflicts : when multiple Skills are retrieved, mechanisms for type‑based interfaces, explicit preconditions, and priority rules are required.

Skill degradation and drift : repeated modifications can cause semantic drift; version‑aware evaluation, regression testing, and change summaries are suggested.

Infrastructure at ecosystem scale : Skills may need package‑manager‑like ecosystems (dependency tracking, sandboxing, audit logs) to become core infrastructure rather than optional add‑ons.

Conclusion – Are Skills the new operating system for AI?

The authors envision Skills as a new OS layer that packages deterministic reasoning, procedural workflows, and system interactions into manageable units, allowing large models to avoid relearning from scratch and enabling composable, reusable knowledge blocks. While many designs remain theoretical, the Skill‑and‑Harness paradigm points toward a future where memory, skills, protocols, and runtime environments co‑evolve as a unified cognitive infrastructure.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Large Language ModelsAI safetyAgent SkillsSkill ArchitectureAgent GovernanceAI Agent Applications
Linyb Geek Road
Written by

Linyb Geek Road

Tech notes

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.