Mastering AI Agents: 12 Actionable Practices for Effective Tool Design
This article distills a year of trial‑and‑error from the Claude Code team into a practical framework for building AI agents, covering action‑space design, structured questioning, task management, progressive context disclosure, iterative tool engineering, common anti‑patterns, and a ready‑to‑use checklist of twelve development tips.
Action Space: Defining the Agent’s Capabilities
Claude Code treats the action space as the exhaustive list of operations a model can perform. Each tool adds a new capability and a corresponding risk. Tools are grouped into three risk/ability tiers, and the team limits the core set to roughly 20 tools, adding a new tool only if it demonstrably reduces failure risk.
Structured Questioning: Reducing Human‑Machine Communication Bandwidth
Plain‑text prompts lead to long, low‑efficiency dialogues. After two failed attempts, the team introduced a dedicated AskUserQuestion tool that presents a UI for short, mutually exclusive questions with defaults and an optional free‑form field. This ensures parsable responses and can be reused across SDKs and Skills.
Attempt 1: Embedded questions in a planning tool, causing role conflict.
Attempt 2: Enforced strict Markdown output; minor model variations broke parsing.
Attempt 3: Deployed the single‑purpose AskUserQuestion tool with a structured UI.
Design guidelines: keep questions brief, make options mutually exclusive, provide defaults, and allow free‑form overrides.
Task Management: From Linear Todo to DAG‑Based Task
Early agents used linear Todo lists suitable for single‑threaded execution. As multiple agents collaborate, Todo becomes a bottleneck. The Task tool models work as a directed acyclic graph (DAG), supporting dependency expression, state synchronization, artifact archiving, and failure rollback.
Todo: Linear, single‑agent, reminder‑only.
Task: DAG‑based, multi‑agent, enables complex coordination.
Tools are considered expired when model capabilities evolve; they should be refactored or replaced.
Context Construction: Progressive Disclosure
From RAG to Grep
RAG (vector store retrieval) requires continuous indexing and is fragile in complex environments. Grep‑style search works well for well‑structured data such as codebases or logs but cannot replace RAG for unstructured multimodal data. The industry now adopts a hybrid “RAG + Grep” approach.
Progressive Disclosure
Instead of loading all knowledge into the system prompt, the model receives an entry point to explore a hierarchy: index → pattern cards → full manual. Deeper layers are loaded only when needed, reducing token waste and improving inference efficiency.
Knowledge Layering
Layer 0 – Index (≈200‑500 words): Quick navigation of capabilities.
Layer 1 – Pattern Cards (≈500‑1500 words): Concrete usage examples and negative cases.
Layer 2 – Full Manual (≥2000 words): Detailed reference loaded on demand.
Monitor search depth and reward ratio; stop deeper searches when answer quality no longer improves.
Tool Design Iteration: 7‑Step Loop
Find Friction: Identify exact failure points from model logs.
Select Minimal Leverage: Prefer prompt tweaks over new tools; avoid tool duplication.
Build Narrow Interfaces: Each tool does one thing with clear boundaries.
Use Structured Data: Define schemas, enums, and defaults for machine‑readable output.
Enable Recovery: Add retry, undo, and rollback mechanisms.
Persist Outputs: Write results to files for auditability.
Regular Review: After model upgrades, reassess and replace expired tools.
Before adding a tool, ask whether it makes failure modes more controllable; otherwise it is likely noise.
Common Anti‑Patterns (6)
Multi‑purpose Tools: Mixing planning, questioning, and execution confuses the model.
Fixed Text Formats: Relying on strict output formats breaks in production.
Todo as Immutable Script: Treating static Todo lists as unchangeable scripts limits autonomy.
Embedding Large Documents in Prompts: Causes context decay and high maintenance cost.
Chasing Capability Limits Without Recoverability: Errors become unrecoverable.
Blind Grep Replacement for RAG: Using Grep on unstructured data degrades retrieval quality.
Implementation Checklist (12 Reusable Tips)
Provide a structured questioning UI (e.g., AskUserQuestion) with short questions, exclusive options, defaults, and free‑form input.
Replace static Todo lists with a Task tool that supports DAG dependencies, state sync, artifact archiving, and rollback.
Use Grep for structured data (code, logs) and RAG for unstructured multimodal sources; combine both.
Organize knowledge in three layers (index, pattern cards, full manual) to avoid context bloat.
Apply progressive disclosure; keep rarely used information out of the active context.
Limit the number of tools; each new tool adds a potential failure point.
Log critical actions for replayability and debugging.
Design cheap recovery paths with clear preconditions and retry logic.
Persist agent outputs as files for human review and downstream reuse.
After each model upgrade, audit existing tools and replace those that have become obsolete.
Follow the 7‑step iteration loop to continuously refine tool design.
Continuously monitor search depth versus answer quality to prevent endless low‑value retrieval.
AI Architecture Hub
Focused on sharing high-quality AI content and practical implementation, helping people learn with fewer missteps and become stronger through AI.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
