Artificial Intelligence 29 min read

Step‑by‑Step Guide to Writing Effective Agent Skill.md Files

This article explains what Agent Skills are, shows the folder layout and SKILL.md format, introduces the progressive‑disclosure design, provides concrete best‑practice tips, testing and evaluation methods, and demonstrates how to package scripts for reliable AI‑assistant automation.

Data STUDIO

Apr 10, 2026

Step‑by‑Step Guide to Writing Effective Agent Skill.md Files

What Are Agent Skills?

At its core, a Skill is a directory whose most important file is SKILL.md. The file acts as a task description, containing both metadata (what the skill does) and detailed instructions (how to do it).

Folder Layout

my-skill/
├── SKILL.md          # required: metadata + core commands
├── scripts/          # optional: executable scripts (Python, Bash, …)
├── references/       # optional: reference docs (API specs, guides)
└── assets/           # optional: static resources (templates, images)

Progressive Disclosure: Like a Delivery Rider

Step 1 – Discovery 🚴‍♂️ : When the agent starts, it only reads the lightweight name and description fields to decide whether the Skill is relevant.

Step 2 – Activation ✅ : If the task matches, the full SKILL.md file is loaded into context.

Step 3 – Execution 🏃‍♂️ : The agent follows the instructions, loading files from references/ or running scripts from scripts/ as needed.

This design keeps the model’s context window small while still allowing complex, multi‑step operations.

SKILL.md File Structure

---
name: pdf-processing
description: Extract text and tables from PDFs, fill forms, and merge files. Use this skill when a user mentions PDFs, forms, or data extraction.
license: Apache-2.0
metadata:
  author: your-team
  version: "1.0"
---

# PDF Processing Skill

## When to Use This Skill
When the user asks for PDF, form, or table extraction, activate this skill.

## How to Extract Text
1. Use `pdfplumber` – the preferred library.
2. For scanned PDFs, fall back to `pdf2image` + `pytesseract`.

## Common Gotchas
- ⚠️ Scanned PDFs return empty results with `pdfplumber`; detect text layer first and switch to OCR.
- When filling forms, always run `scripts/analyze_form.py` to list fields before populating them.

Metadata Fields Explained

name (required) : Identifier, lowercase letters, numbers, hyphens, max 64 characters; must match the parent directory name.

description (required) : Core phrase that tells the agent when to use the skill; up to 1024 characters, include keywords.

license (optional) : License information.

compatibility (optional) : Environment requirements, e.g., "Python 3.10+ and uv package manager".

metadata (optional) : Custom key‑value pairs for extra information.

allowed-tools (optional) : Experimental list of pre‑approved tools.

Writing a High‑Quality Skill

1. Derive From Real Experience

Instead of vague statements like "handle errors" or "follow best practices", capture concrete steps, corrections, and input‑output formats that you have actually used in a project.

2. Define Clear Boundaries

Design the skill like a function: it should perform a single coherent unit of work. Too narrow a scope requires chaining many skills; too broad a scope makes activation ambiguous.

3. Teach the Method, Not the Answer

Provide the reasoning process so the model can generalise. Example of a bad, single‑use case versus a good, reusable method is shown in the article.

4. Provide Default Choices

When multiple tools are possible, recommend a default (e.g., `pdfplumber`) and list alternatives only as fallbacks.

5. Include a Gotchas Section

Document pitfalls that only someone who has encountered them would know, such as soft‑delete columns or health‑check endpoints that mislead the agent.

6. Supply Output Templates and Checklists

Give the agent a concrete Markdown template for reports and a checklist of steps to ensure nothing is missed.

Ensuring the Skill Is Seen by the Agent

The description field is the primary hook. A vague description like "process CSV" will never be matched, whereas a detailed description with keywords ensures activation.

Systematic Evaluation of a Skill

Write test cases in evals/evals.json with three parts: prompt, expected output, and optional input files. Example JSON shows a CSV‑analysis task and its expected chart output.

Design Test Cases

Prompt : Real user message.

Expected Output : Human‑readable description of the correct result.

Input File (optional): Files the skill needs.

Run Evaluations

Execute each test twice – once with the skill enabled and once without – to obtain a baseline. Store outputs, timing, and token usage in a structured workspace hierarchy.

Write Assertions

Assertions are concrete, verifiable statements such as "output is valid JSON" or "chart contains three bars". Weak assertions like "output looks good" are avoided.

Scoring Principles

Pass only with concrete evidence; vague matches fail.

Inspect the difficulty of assertions – discard those that always pass or always fail.

Aggregate Results

Summarise pass rates, execution time, and token consumption in benchmark.json. The delta field shows the trade‑off between cost (time, tokens) and benefit (higher pass rate).

Iterative Improvement Loop

Use failed assertions, human feedback, and execution logs to guide revisions. Typical steps:

Feed the current SKILL.md and evaluation signals to an LLM for improvement suggestions.

Apply the changes.

Run a new iteration of tests.

Score and aggregate results.

Repeat until improvements plateau.

Packaging Scripts for Reuse

Skills can invoke one‑off commands directly or bundle reusable logic in the scripts/ directory. The article lists common one‑off tools (uvx, pipx, npx, bunx, deno run, go run) and shows how to fix versions for reproducibility.

Referencing Scripts

Use relative paths from the skill root, e.g., bash scripts/validate.sh "$INPUT_FILE". List available scripts in the markdown so the agent knows they exist.

Self‑Contained Scripts

Examples include a Python script with a PEP 723 # /// script TOML block declaring dependencies, a Deno script with an import‑map, and a Ruby script using bundler/inline to declare gems.

Designing Agent‑Friendly Scripts

Avoid interactive prompts; provide clear error messages and a --help usage description.

Emit structured output (JSON, CSV) on stdout and diagnostics on stderr.

Make commands idempotent and expose meaningful exit codes.

Offer --dry‑run or --output flags for large results.

Final Thoughts

Agent Skills are a lightweight, structured way to give a general‑purpose LLM domain expertise without modifying the model itself. The progressive‑disclosure principle keeps context size low while enabling sophisticated, repeatable workflows. By turning repetitive, error‑prone tasks into well‑defined Skills, you not only empower the AI assistant but also create a living knowledge base for your team.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

automation Prompt Engineering Scripting evaluation AI assistant agent-skills Skill.md

Written by

Data STUDIO

Click to receive the "Python Study Handbook"; reply "benefit" in the chat to get it. Data STUDIO focuses on original data science articles, centered on Python, covering machine learning, data analysis, visualization, MySQL and other practical knowledge and project case studies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.