Step‑by‑Step Guide to Writing Effective Agent Skill.md Files
This article explains what Agent Skills are, shows the folder layout and SKILL.md format, introduces the progressive‑disclosure design, provides concrete best‑practice tips, testing and evaluation methods, and demonstrates how to package scripts for reliable AI‑assistant automation.
What Are Agent Skills?
At its core, a Skill is a directory whose most important file is SKILL.md. The file acts as a task description, containing both metadata (what the skill does) and detailed instructions (how to do it).
Folder Layout
my-skill/
├── SKILL.md # required: metadata + core commands
├── scripts/ # optional: executable scripts (Python, Bash, …)
├── references/ # optional: reference docs (API specs, guides)
└── assets/ # optional: static resources (templates, images)Progressive Disclosure: Like a Delivery Rider
Step 1 – Discovery 🚴♂️ : When the agent starts, it only reads the lightweight name and description fields to decide whether the Skill is relevant.
Step 2 – Activation ✅ : If the task matches, the full SKILL.md file is loaded into context.
Step 3 – Execution 🏃♂️ : The agent follows the instructions, loading files from references/ or running scripts from scripts/ as needed.
This design keeps the model’s context window small while still allowing complex, multi‑step operations.
SKILL.md File Structure
---
name: pdf-processing
description: Extract text and tables from PDFs, fill forms, and merge files. Use this skill when a user mentions PDFs, forms, or data extraction.
license: Apache-2.0
metadata:
author: your-team
version: "1.0"
---
# PDF Processing Skill
## When to Use This Skill
When the user asks for PDF, form, or table extraction, activate this skill.
## How to Extract Text
1. Use `pdfplumber` – the preferred library.
2. For scanned PDFs, fall back to `pdf2image` + `pytesseract`.
## Common Gotchas
- ⚠️ Scanned PDFs return empty results with `pdfplumber`; detect text layer first and switch to OCR.
- When filling forms, always run `scripts/analyze_form.py` to list fields before populating them.Metadata Fields Explained
name (required) : Identifier, lowercase letters, numbers, hyphens, max 64 characters; must match the parent directory name.
description (required) : Core phrase that tells the agent when to use the skill; up to 1024 characters, include keywords.
license (optional) : License information.
compatibility (optional) : Environment requirements, e.g., "Python 3.10+ and uv package manager".
metadata (optional) : Custom key‑value pairs for extra information.
allowed-tools (optional) : Experimental list of pre‑approved tools.
Writing a High‑Quality Skill
1. Derive From Real Experience
Instead of vague statements like "handle errors" or "follow best practices", capture concrete steps, corrections, and input‑output formats that you have actually used in a project.
2. Define Clear Boundaries
Design the skill like a function: it should perform a single coherent unit of work. Too narrow a scope requires chaining many skills; too broad a scope makes activation ambiguous.
3. Teach the Method, Not the Answer
Provide the reasoning process so the model can generalise. Example of a bad, single‑use case versus a good, reusable method is shown in the article.
4. Provide Default Choices
When multiple tools are possible, recommend a default (e.g., `pdfplumber`) and list alternatives only as fallbacks.
5. Include a Gotchas Section
Document pitfalls that only someone who has encountered them would know, such as soft‑delete columns or health‑check endpoints that mislead the agent.
6. Supply Output Templates and Checklists
Give the agent a concrete Markdown template for reports and a checklist of steps to ensure nothing is missed.
Ensuring the Skill Is Seen by the Agent
The description field is the primary hook. A vague description like "process CSV" will never be matched, whereas a detailed description with keywords ensures activation.
Systematic Evaluation of a Skill
Write test cases in evals/evals.json with three parts: prompt, expected output, and optional input files. Example JSON shows a CSV‑analysis task and its expected chart output.
Design Test Cases
Prompt : Real user message.
Expected Output : Human‑readable description of the correct result.
Input File (optional): Files the skill needs.
Run Evaluations
Execute each test twice – once with the skill enabled and once without – to obtain a baseline. Store outputs, timing, and token usage in a structured workspace hierarchy.
Write Assertions
Assertions are concrete, verifiable statements such as "output is valid JSON" or "chart contains three bars". Weak assertions like "output looks good" are avoided.
Scoring Principles
Pass only with concrete evidence; vague matches fail.
Inspect the difficulty of assertions – discard those that always pass or always fail.
Aggregate Results
Summarise pass rates, execution time, and token consumption in benchmark.json. The delta field shows the trade‑off between cost (time, tokens) and benefit (higher pass rate).
Iterative Improvement Loop
Use failed assertions, human feedback, and execution logs to guide revisions. Typical steps:
Feed the current SKILL.md and evaluation signals to an LLM for improvement suggestions.
Apply the changes.
Run a new iteration of tests.
Score and aggregate results.
Repeat until improvements plateau.
Packaging Scripts for Reuse
Skills can invoke one‑off commands directly or bundle reusable logic in the scripts/ directory. The article lists common one‑off tools (uvx, pipx, npx, bunx, deno run, go run) and shows how to fix versions for reproducibility.
Referencing Scripts
Use relative paths from the skill root, e.g., bash scripts/validate.sh "$INPUT_FILE". List available scripts in the markdown so the agent knows they exist.
Self‑Contained Scripts
Examples include a Python script with a PEP 723 # /// script TOML block declaring dependencies, a Deno script with an import‑map, and a Ruby script using bundler/inline to declare gems.
Designing Agent‑Friendly Scripts
Avoid interactive prompts; provide clear error messages and a --help usage description.
Emit structured output (JSON, CSV) on stdout and diagnostics on stderr.
Make commands idempotent and expose meaningful exit codes.
Offer --dry‑run or --output flags for large results.
Final Thoughts
Agent Skills are a lightweight, structured way to give a general‑purpose LLM domain expertise without modifying the model itself. The progressive‑disclosure principle keeps context size low while enabling sophisticated, repeatable workflows. By turning repetitive, error‑prone tasks into well‑defined Skills, you not only empower the AI assistant but also create a living knowledge base for your team.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data STUDIO
Click to receive the "Python Study Handbook"; reply "benefit" in the chat to get it. Data STUDIO focuses on original data science articles, centered on Python, covering machine learning, data analysis, visualization, MySQL and other practical knowledge and project case studies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
