9 Hard‑Earned Lessons from Anthropic Engineers on Building Claude Code Skills
Anthropic engineers share nine practical lessons from hundreds of Claude Code Skills, covering folder‑based design, nine skill categories, the importance of Gotchas, description as a trigger, giving Claude code, memory handling, flexible hooks, and team distribution strategies.
Skills Are Folders, Not Markdown Files
The first paradigm shift is treating a Skill as a directory that can contain scripts, assets, reference documents, and configuration files, rather than a single SKILL.md. This enables progressive disclosure: Claude reads files only when needed, reducing context overload.
Example: place a Markdown template in an assets/ folder for report generation, and store API signatures in references/api.md for Claude to read on demand.
9 Skill Categories
Thariq’s team classified internal Skills into nine groups, noting that many engineers only cover two or three categories.
Knowledge & Reference : guides for internal libraries, CLI, SDK usage (e.g., billing-lib, internal-platform-cli, frontend-design).
Verification : scripts that test or verify code, often using Playwright or tmux (e.g., signup-flow-driver, checkout-verifier, tmux-cli-driver).
Data Access : connectors to databases, dashboards, or query workflows (e.g., funnel-query, cohort-compare, grafana).
Automation : compress repetitive actions into a single command, logging results for consistency (e.g., standup-post, create-ticket, weekly-recap).
Scaffolding : generate boilerplate for modules where natural‑language requirements exist (e.g., new-workflow, new-migration, create-app).
Code Review : enforce quality via deterministic scripts or Git hooks (e.g., adversarial-review, code-style, testing-practices).
Deploy : pull, push, and deploy code, possibly chaining other Skills (e.g., babysit-pr, deploy-, cherry-pick-prod).
Debugging : receive symptoms and run multi‑tool investigation pipelines (e.g., -debugging, oncall-runner, log-correlator).
Operations : routine maintenance with safety steps (e.g., -orphans, dependency-management, cost-investigation).
Gotchas Are the Most Valuable Part
Each Skill should contain a "Gotchas" section that records real failures Claude encountered. Updating this section whenever a new pitfall appears dramatically improves reliability because Claude’s default knowledge often overlaps with generic code generation.
Description Field Is the Trigger
Claude indexes the description of every Skill at startup. The description must state *when* the Skill should be used, not *what* it does. This makes the Skill discoverable during user queries.
Give Claude Code, Not Just Text Instructions
Embedding scripts and helper functions inside a Skill lets Claude focus on orchestration and decision‑making. For example, a data‑analysis Skill can expose helper functions that fetch events; Claude then generates a short script that calls those helpers to answer questions like "What happened last Tuesday?".
Skills Can Have Their Own Memory
Skills may store state in files within their directory—simple append‑only logs or SQLite databases. A standup-post Skill can maintain a standups.log to compare current output with previous runs, enabling cross‑session continuity.
When persisting data, place it under the stable directory provided by ${CLAUDE_PLUGIN_DATA} to avoid loss during Skill upgrades.
Hooks Activate Only When Called
Hooks defined inside a Skill run only during that Skill’s execution and disappear after the session, which is ideal for context‑specific safeguards (e.g., blocking rm -rf, DROP TABLE, or kubectl delete in production).
Distributing Skills Within a Team
Two distribution methods exist: commit Skills to the repository’s .claude/skills folder for small teams, or publish them to an internal plugin marketplace for larger organizations. Anthropic’s practice relies on organic sharing via GitHub sandboxes and Slack, with popular Skills eventually merged into the marketplace.
Before release, filter out duplicate or low‑quality Skills. Skills can reference each other by name (e.g., a CSV‑generation Skill depending on a file‑upload Skill), although native dependency management is not yet available.
Thariq’s team logs each Skill invocation using a PreToolUse hook, enabling analysis of usage frequency and identification of poorly described Skills.
ShiZhen AI
Tech blogger with over 10 years of experience at leading tech firms, AI efficiency and delivery expert focusing on AI productivity. Covers tech gadgets, AI-driven efficiency, and leisure— AI leisure community. 🛰 szzdzhp001
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
