Anthropic Reveals Top Practices for Building Skills in Claude Code

Anthropic’s internal analysis of hundreds of Claude Code skills shows that verification‑oriented skills deliver the greatest boost to AI coding assistant output, and it outlines nine skill categories, seven design principles, on‑demand hooks, and distribution strategies for effective agent development.

PaperAgent
PaperAgent
PaperAgent
Anthropic Reveals Top Practices for Building Skills in Claude Code

Anthropic discovered that, after deploying hundreds of skills for Claude Code, the most impactful factor for AI programming assistant output quality is not the skill that teaches code generation but the skill that teaches code verification.

What a Claude Code skill actually is

A skill is a folder, not a single markdown file. The folder can contain scripts, data, templates, and configuration files that the agent discovers and reads on demand. SKILL.md serves only as an entry point that points to other files.

This approach implements progressive disclosure: instead of loading all information at once, the skill tells the agent which files are available and lets it decide when to read each one.

Skill folder structure
Skill folder structure

Nine skill categories

Library and API reference : teach the agent correct use of internal libraries/CLI/SDK (e.g., billing-lib, internal-platform-cli).

Product verification : teach the agent how to test/verify that code works (e.g., signup-flow-driver, checkout-verifier).

Data acquisition and analysis : connect to the data stack and provide query paths (e.g., funnel-query, datadog).

Business workflow automation : compress repetitive workflows into a single command (e.g., standup-post, weekly-recap).

Code scaffolding : generate framework templates and boilerplate (e.g., new-migration, create-app).

Code quality and review : enforce style and review processes (e.g., adversarial-review, code-style).

CI/CD and deployment : push code, deploy, monitor PRs (e.g., babysit-pr, deploy-<service>).

Operations handbook : from symptom to multi‑tool investigation to structured report (e.g., oncall-runner, log-correlator).

Infrastructure operations : daily maintenance with safeguards for destructive actions (e.g., <resource>-orphans, cost-investigation).

Nine skill categories overview
Nine skill categories overview

Verification skills deliver the biggest ROI

Anthropic states plainly that verification skills have had the most measurable impact on Claude’s output quality internally. Investing a full week for an engineer to perfect a verification skill is worthwhile because AI’s self‑verification is its weakest link.

Effective verification involves running the generated code in a headless browser, asserting state at each step, using Stripe test cards for payments, and validating CLI interactions in a TTY—tasks the agent cannot infer on its own.

Do not state the obvious
Do not state the obvious

Seven principles for writing skills

1. Avoid stating the obvious

Claude already writes and reads code. A skill that merely repeats what Claude would do adds context overhead without value. Focus on information that pushes Claude beyond its default behavior, such as a frontend‑design skill that corrects Claude’s “aesthetic inertia”.

2. Gotchas are the highest‑density signal area

The most valuable part of any skill is the “gotchas” section, which records pitfalls discovered from repeated agent failures (e.g., the latest row in an append‑only subscriptions table is identified by the highest version number, not by created_at; the same identifier appears as @request_id in the API gateway and trace_id in the billing service; a 200 response from a staging environment does not guarantee that a Stripe webhook was processed).

Gotchas and file system
Gotchas and file system

3. Treat the file system as a context‑engine

SKILL.md

should point to other files instead of containing everything. Use references/api.md for detailed signatures, assets/ for output templates, and scripts/ for helper scripts. This lets Claude discover available resources on demand and reduces context load.

4. Don’t over‑constrain Claude

Skills that are too specific can harm reuse. Provide information while preserving flexibility—for example, say “usually do A first, but adjust based on the situation” instead of “must do A then B then C”.

Avoid over‑restriction
Avoid over‑restriction

5. Think through initialization

If a skill needs user configuration (e.g., which Slack channel to post a standup), store defaults in config.json and prompt the agent with AskUserQuestion when the config is missing.

Initialization config
Initialization config

6. Write descriptions for the model, not humans

When Claude lists skill descriptions at startup, the description should specify the trigger condition, not act as a human‑readable summary. For example, the description for babysit-pr must contain the keyword “babysit” so that a user utterance like “babysit my PR” matches.

Descriptions for the model
Descriptions for the model

7. Enable Claude to remember

Skills can embed memory, often as an append‑only log. For instance, after each standup post, append the content to standups.log so the next run can reference previous entries and output only the incremental update.

Help Claude remember
Help Claude remember

Two on‑demand hook patterns

Skills can register hooks that activate only when the skill is invoked, then shut down:

/careful – intercept dangerous commands such as rm -rf, DROP TABLE, force-push, kubectl delete. Enable only in production‑critical contexts.

/freeze – restrict edit/write operations to a specified directory, preventing accidental modifications while debugging.

On‑demand hooks
On‑demand hooks

Distribution and metrics

Skills are distributed in two ways: small teams commit directly to the .claude/skills directory in a repo; larger organizations use an internal Plugin Marketplace to avoid context overhead from many skills.

Anthropic does not enforce centralized approval. Skills grow organically: contributors publish to a sandbox, announce in Slack, and successful skills are later PR‑ed into the marketplace.

Usage is measured with the PreToolUse hook, which logs skill invocations, revealing popular skills and missed trigger opportunities.

What this means

The core takeaway for anyone using AI coding tools is that investing time in teaching the AI how to verify code yields higher returns than teaching it how to write code. Verification skills capture domain‑specific gotchas—such as the equivalence of @request_id and trace_id or the unreliability of a staging 200 response—making them the most cost‑effective improvement point.

Skills evolve organically: each new pitfall added by engineers becomes part of the skill, and the best skills emerge from this incremental process.

Lessons from building Claude Code: How we use skills
https://claude.com/blog/lessons-from-building-claude-code-how-we-use-skills
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AI agentsPrompt Engineeringagentic AIClaudeSkillsVerification
PaperAgent
Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.