Designing Agent Tools: Key Lessons from Claude Code’s Action Space
This article distills the Claude Code team's hard‑won insights on building effective AI agents, highlighting why action‑space design outweighs model size, how structured questioning improves bandwidth, when to replace Todos with Tasks, and a repeatable seven‑step loop for evolving toolsets.
Background
The Claude Code team, led by Thariq, published a detailed post titled Lessons from Building Claude Code: Seeing like an Agent . It shares the pitfalls, trial‑and‑error paths, and the final methodology they settled on for building agent‑centric tools.
Core Insight: Action Space Design
The hardest problem in agent development is not model intelligence but designing the action space —the set of tools the model can invoke. Too many tools overwhelm the model; too few limit capability. The goal is to give the model tools it can use well and recover from failures.
Thariq’s mantra is to learn to see like an agent : observe real dialogues, spot where the model gets stuck, and iterate on the tool design.
TL;DR – Key Takeaways
Action space design determines the agent’s behavior.
Tool strength matters less than the model’s ability to use it.
Structured questioning (AskUserQuestion) dramatically reduces bandwidth loss.
Over‑engineered tools become constraints as models improve.
Progressive disclosure (layered knowledge) is more stable than stuffing everything into prompts.
Keep tool count low; each new tool adds a failure point.
1️⃣ Action Space Is Your Product
Many teams pile on capabilities—web access, database queries, multiple models—only to discover two common failures after launch:
The model hesitates at dozens of entry points, trying the wrong tool.
Developers cannot trace why a particular action was chosen, making debugging hard.
In Claude’s API, tools are built from primitives such as bash, skills, and code execution. The design dilemma becomes whether to expose a single “universal” tool or many specialized tools.
You face a tough math problem. Which tool would you choose? Paper & pencil – low‑tech, slow. Calculator – faster, requires skill. Computer – most powerful, requires coding.
For agents, give the model the tool it can use effectively, not the one you think is strongest.
Practical risk/ability matrix (simplified):
Low risk, low ability : read‑only retrieval, safe but limited.
Medium risk, composable : structured queries, task management – higher bandwidth, needs clear contracts.
High risk, high upside : bash, code execution, network access – powerful but requires strict permissions, state handling, and recoverability.
2️⃣ Structured Questioning – AskUserQuestion
Agents often need clarification, but plain‑text questions suffer from low bandwidth. Claude Code’s goal was to lower this friction.
Attempt 1: Embed questions in ExitPlanTool
Adding a question parameter to the planning tool confused the model about whether it was planning or asking, leading to role clashes.
Attempt 2: Enforce strict Markdown format
Requiring a rigid markdown schema made the system brittle; any deviation broke parsing.
Attempt 3: Dedicated AskUserQuestion tool
This single‑purpose tool presents a structured UI, blocks the agent loop until the user answers, produces a fixed, parse‑able output, and can be reused across SDKs and skills.
3️⃣ From Todos to Tasks
Early Claude Code used a TodoWrite tool to keep the model on track. As the model grew stronger, the todo list became a restrictive script, especially with sub‑agents.
Switching to a Task tool transformed the workflow into a collaborative protocol:
Todos : linear list, good for single‑threaded, anti‑drift scenarios.
Tasks : DAG with dependencies, state sync, output archiving, and rollback – essential for multi‑agent coordination.
Tools inevitably expire; when the model’s capabilities evolve, old tools must be revisited.
4️⃣ Progressive Context Building
Claude initially relied on RAG vector retrieval, which required heavy indexing and still fed static context. To let the model find information itself, a Grep ‑style tool was added, later formalized as Skills with the principle of progressive disclosure :
Provide an entry point, let the model fetch files layer by layer.
Never load the entire knowledge base into the prompt.
Knowledge layering example:
Layer 0 – Index (200‑500 chars) : list capabilities and entry points.
Layer 1 – Pattern cards (500‑1500 chars) : checklists, examples, negative cases; must be executable.
Layer 2 – Full manual (>2000 chars) : loaded only when needed.
Recursive reading should be bounded by two metrics: search depth and payoff . Stop when deeper searches no longer improve answer quality.
5️⃣ Seven‑Step Iterative Tool Design Loop
Find friction in real model outputs.
Pick the smallest lever (prompt tweak before adding a tool).
Make the interface narrow – one tool, one responsibility.
Give structure to the machine (schemas, enums, defaults).
Make failures recoverable (retries, rollbacks).
Persist outputs as reviewable files.
Periodically audit tools for obsolescence after model upgrades.
6️⃣ Decision Table – When to Add a Tool
Unstable phrasing or occasional format drift → tweak prompts, low cost, no action‑space change.
Need stable, parseable output → add a tool with schema and enums.
Large but rarely used knowledge → layered files with progressive disclosure.
Complex lookup (docs, code) with a pattern → sub‑agent with search strategy.
7️⃣ Anti‑Patterns
One tool trying to plan, ask, and execute simultaneously.
Relying on the model to always output a strict text format.
Treating a Todo list as an immutable script.
Embedding an entire document in the system prompt.
Prioritizing raw capability over recoverability.
8️⃣ Tool Design Is an Art, Not a Science
Effective tool design requires continuous experimentation, output analysis, and incremental improvements, always returning to the principle of “seeing like an agent.”
Practical Checklist (12 items)
Provide a structured “Ask” entry point (AskUserQuestion or equivalent).
Design the UI contract: short questions, mutually exclusive options, defaults, optional free‑form input.
Replace static Todos with collaborative Tasks.
Ensure Tasks express dependencies, state, and output locations.
Give the model a self‑search tool (Grep) for code‑base context.
Organize knowledge in layered files (index, pattern cards, full manual).
Apply progressive disclosure – keep rarely used info out of the prompt.
Limit tool count – each new tool adds a failure point.
Make important actions replayable (logs, traces, parameters).
Optimize for recoverability (cheap retries, clear preconditions, observable state).
Persist deliverables as files for review.
After each model upgrade, reassess and prune outdated tools.
Source
Original post by Thariq ( @trq212 ) on X:
https://x.com/trq212/status/2027463795355095314Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
