Mastering Agent Skills: Design, Implementation, and Evaluation for Efficient AI Workflows

This article explains why large‑model agents need structured Skills to capture team‑specific knowledge and workflows, describes the evolution from MCP to Skill, details the progressive‑disclosure design, shows how to write, organize, and install Skills, and provides a systematic evaluation and iteration process to ensure high‑quality, token‑efficient agent behavior.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
Mastering Agent Skills: Design, Implementation, and Evaluation for Efficient AI Workflows

Background and Motivation

Large language model agents can write code and call tools but lack team‑specific knowledge, coding standards, and workflow details, causing inefficiency and instability. Each conversation often requires re‑teaching the agent.

Evolution of Agent Extensibility

First, Model Context Protocol (MCP) standardized tool calls (Nov 2024). Then the community‑driven AGENTS.md files stored project conventions. Finally, Anthropic introduced Skill (Oct 2025) as a structured folder containing description, instructions, references, and scripts, becoming the mainstream mechanism.

Core Design of Skill – Progressive Disclosure

Skill is disclosed in three layers:

YAML front‑matter description : a concise natural‑language statement of what the skill does, its core capability, and activation triggers.

SKILL.md body : full instructions, workflow steps, and examples, loaded only when the description matches the user request.

references/ and scripts/ directories : auxiliary documentation and executable scripts accessed on demand.

This design improves token efficiency and keeps the model’s attention focused on the most relevant information.

Organizing and Installing Skills

Store all skills in a Git repository to obtain version history and team collaboration. A minimal layout looks like:

team-skills/
├── code-review/
│   └── SKILL.md
├── react-state-management/
│   ├── SKILL.md
│   └── references/
├── sprint-planning/
│   ├── SKILL.md
│   └── scripts/
└── ...

Installation can be automated with the open‑source skills CLI (e.g.,

npx skills add https://github.com/your-team/skills/tree/main/code-review

) or manually placed in platform‑specific directories such as .claude/skills/code-review/SKILL.md.

How to Write a Skill

File Structure

A Skill folder must contain a file named exactly SKILL.md (case‑sensitive). Optional subfolders are scripts/, references/, and assets/. Folder names use kebab‑case (e.g., my-cool-skill).

SKILL.md Layout

Two sections are required:

---
name: my-skill-name
description: What it does. Trigger phrase(s). Core capabilities A, B, C.
---
# My Skill Name
## Instructions

The description field is mandatory and drives activation; the markdown body contains the actual commands.

Working Principle

Three stages occur at runtime:

Indexing : All description strings are injected into the agent’s system prompt.

Activation : When a user query matches a description, the agent issues a tool call ( view or read) to fetch the full SKILL.md.

Execution : The agent follows the instructions; if the body references references/ or scripts/, those resources are loaded only when needed.

Designing Effective Descriptions

A good description answers three questions:

What does the skill do?

What core capability does it provide?

When should it be triggered?

Examples of precise descriptions are provided, as well as common pitfalls (over‑broad or missing trigger information). Negative triggers can be added to avoid over‑activation.

Body Design Patterns

Knowledge‑Document Style

Used when the skill supplies domain knowledge, quality checklists, or few‑shot examples. Example snippet:

## Code Review Standards
### Critical Checks (must pass)
1. No hard‑coded credentials
2. All user inputs sanitized
3. Error boundaries on async ops
### Example Review
**Input:** A React component with inline styles
**Expected output:**
- Flag: inline styles → suggest CSS modules
- Suggest: extract magic numbers to constants

Workflow Style

Used for multi‑step processes with validation between steps. Example:

## Sprint Planning Workflow
### Step 1: Gather Context
Fetch current project status from Linear.
**Validation:** At least one active project returned.
### Step 2: Analyze Velocity
Calculate team velocity from last 3 sprints.
**Validation:** Velocity data covers at least 2 complete sprints.
### Step 3: Draft Plan
Create task breakdown with estimates.
**Validation:** Total story points ≤ average velocity × 0.85 (buffer).

Evaluation and Iterative Improvement

Testing Descriptions

Build a test set of 16‑20 queries split into “should trigger” and “should not trigger”. Measure recall (triggered when it should) and precision (not triggered when it shouldn’t). Include realistic phrasing, misspellings, and near‑miss scenarios.

Testing Bodies

Run controlled A/B experiments: with the skill versus baseline (no skill or previous version). Define objective metrics (schema validation, required fields) and subjective checklists (useful suggestions, false positives). Record token consumption and latency.

Iterative Loop

Follow five steps: run experiments, score, analyze feedback, modify SKILL.md or scripts, and re‑run. Stop when improvements plateau or the skill meets quality goals.

Case Studies

Skill‑Creator Evolution

Version 1 generated a basic SKILL.md. Version 2 added self‑analysis and improvement. Version 3 incorporated the full evaluation loop described above.

Code‑Review Skill

Initial simple prompt suffered from context overflow. A second version introduced sub‑agents per file, explicit contracts, and robust validation, yielding higher coverage and lower false‑positive rates.

Conclusion

Skill provides a standardized, progressive‑disclosure mechanism that makes agents reusable, efficient, and aligned with team processes. Proper description crafting, body design, and systematic evaluation are essential to turn a generic LLM into a reliable, domain‑specific assistant.

AISkill
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.