9 Hard‑Earned Lessons from Anthropic Engineers on Building Claude Code Skills

Anthropic engineers share nine practical lessons from hundreds of Claude Code Skills, covering folder‑based design, nine skill categories, the importance of Gotchas, description as a trigger, giving Claude code, memory handling, flexible hooks, and team distribution strategies.

ShiZhen AI
ShiZhen AI
ShiZhen AI
9 Hard‑Earned Lessons from Anthropic Engineers on Building Claude Code Skills

Skills Are Folders, Not Markdown Files

The first paradigm shift is treating a Skill as a directory that can contain scripts, assets, reference documents, and configuration files, rather than a single SKILL.md. This enables progressive disclosure: Claude reads files only when needed, reducing context overload.

Example: place a Markdown template in an assets/ folder for report generation, and store API signatures in references/api.md for Claude to read on demand.

9 Skill Categories

Thariq’s team classified internal Skills into nine groups, noting that many engineers only cover two or three categories.

Knowledge & Reference : guides for internal libraries, CLI, SDK usage (e.g., billing-lib, internal-platform-cli, frontend-design).

Verification : scripts that test or verify code, often using Playwright or tmux (e.g., signup-flow-driver, checkout-verifier, tmux-cli-driver).

Data Access : connectors to databases, dashboards, or query workflows (e.g., funnel-query, cohort-compare, grafana).

Automation : compress repetitive actions into a single command, logging results for consistency (e.g., standup-post, create-ticket, weekly-recap).

Scaffolding : generate boilerplate for modules where natural‑language requirements exist (e.g., new-workflow, new-migration, create-app).

Code Review : enforce quality via deterministic scripts or Git hooks (e.g., adversarial-review, code-style, testing-practices).

Deploy : pull, push, and deploy code, possibly chaining other Skills (e.g., babysit-pr, deploy-, cherry-pick-prod).

Debugging : receive symptoms and run multi‑tool investigation pipelines (e.g., -debugging, oncall-runner, log-correlator).

Operations : routine maintenance with safety steps (e.g., -orphans, dependency-management, cost-investigation).

Gotchas Are the Most Valuable Part

Each Skill should contain a "Gotchas" section that records real failures Claude encountered. Updating this section whenever a new pitfall appears dramatically improves reliability because Claude’s default knowledge often overlaps with generic code generation.

Description Field Is the Trigger

Claude indexes the description of every Skill at startup. The description must state *when* the Skill should be used, not *what* it does. This makes the Skill discoverable during user queries.

Give Claude Code, Not Just Text Instructions

Embedding scripts and helper functions inside a Skill lets Claude focus on orchestration and decision‑making. For example, a data‑analysis Skill can expose helper functions that fetch events; Claude then generates a short script that calls those helpers to answer questions like "What happened last Tuesday?".

Skills Can Have Their Own Memory

Skills may store state in files within their directory—simple append‑only logs or SQLite databases. A standup-post Skill can maintain a standups.log to compare current output with previous runs, enabling cross‑session continuity.

When persisting data, place it under the stable directory provided by ${CLAUDE_PLUGIN_DATA} to avoid loss during Skill upgrades.

Hooks Activate Only When Called

Hooks defined inside a Skill run only during that Skill’s execution and disappear after the session, which is ideal for context‑specific safeguards (e.g., blocking rm -rf, DROP TABLE, or kubectl delete in production).

Distributing Skills Within a Team

Two distribution methods exist: commit Skills to the repository’s .claude/skills folder for small teams, or publish them to an internal plugin marketplace for larger organizations. Anthropic’s practice relies on organic sharing via GitHub sandboxes and Slack, with popular Skills eventually merged into the marketplace.

Before release, filter out duplicate or low‑quality Skills. Skills can reference each other by name (e.g., a CSV‑generation Skill depending on a file‑upload Skill), although native dependency management is not yet available.

Thariq’s team logs each Skill invocation using a PreToolUse hook, enabling analysis of usage frequency and identification of poorly described Skills.

automationprompt engineeringAnthropicOperational Best PracticesClaude CodeAI skills
ShiZhen AI
Written by

ShiZhen AI

Tech blogger with over 10 years of experience at leading tech firms, AI efficiency and delivery expert focusing on AI productivity. Covers tech gadgets, AI-driven efficiency, and leisure— AI leisure community. 🛰 szzdzhp001

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.