Artificial Intelligence 24 min read

Three New Ways to Tackle Agent Context Engineering with Claude’s Tools

Anthropic’s recent release introduces three advanced capabilities—Tool Search, Programmatic Tool Calling, and Tool Use Examples—that reduce token consumption, avoid context pollution, and improve tool‑calling accuracy for AI agents, with detailed benchmarks, code samples, and guidance on when each feature is most effective.

AI Tech Publishing

Nov 25, 2025

Three New Ways to Tackle Agent Context Engineering with Claude’s Tools

Background and Motivation

Building efficient AI agents requires handling hundreds of tools without loading all definitions into the model’s context. Unoptimized tool definitions can consume >50,000 tokens before a conversation starts, and intermediate results can further pollute the context, leading to token exhaustion and reduced accuracy.

1. Tool Search Tool

Challenge

Tool definitions for a five‑server MCP setup (GitHub, Slack, Sentry, Grafana, Splunk) total ~55,000 tokens; adding Jira pushes the cost over 100,000 tokens.

Incorrect tool selection and similar names (e.g., notification-send-user vs notification-send-channel) cause failures.

Solution

The Tool Search Tool lets Claude discover tools on demand. Only the search tool itself (~500 tokens) is loaded initially; relevant tools are fetched and expanded in the context when needed.

Compared with the traditional approach (≈122,800 tokens), the search‑based workflow uses about 8.7K tokens—a 95% reduction. Internal tests show accuracy improvements: Opus 4 rises from 49% to 74% and Opus 4.5 from 79.5% to 88.1%.

{
  "tools": [
    // Tool Search Tool definition
    {"type": "tool_search_tool_regex_20251119", "name": "tool_search_tool_regex"},
    // Example of a deferred‑loading tool
    {
      "name": "github.createPullRequest",
      "description": "Create a pull request",
      "input_schema": {...},
      "defer_loading": true
    }
    // ... hundreds of other deferred tools
  ]
}

When to Use

Tool definitions exceed 10K tokens.

Frequent tool‑selection errors.

Systems driven by many MCP servers.

More than ten available tools.

When Not to Use

Small toolsets (<10 tools).

All tools are used in every session.

Very compact tool definitions.

2. Programmatic Tool Calling

Challenge

Context Pollution: Analyzing a 10 MB log file or aggregating large data sets forces the entire payload into the model’s context, displacing useful information.

Inference Overhead: Each tool call triggers a full model inference pass; a workflow with five tools incurs five separate passes, increasing latency and error risk.

Solution

Programmatic Tool Calling (PTC) lets Claude generate Python code that orchestrates multiple tools inside a sandboxed code‑execution environment. Only the final result is returned to the model, eliminating intermediate data from the context.

Budget‑compliance example demonstrates the approach:

team = await get_team_members("engineering")
# Determine unique levels
levels = list(set(m["level"] for m in team))
budget_results = await asyncio.gather(*[get_budget_by_level(l) for l in levels])
budgets = {level: budget for level, budget in zip(levels, budget_results)}
expenses = await asyncio.gather(*[get_expenses(m["id"], "Q3") for m in team])
exceeded = []
for member, exp in zip(team, expenses):
    budget = budgets[member["level"]]
    total = sum(e["amount"] for e in exp)
    if total > budget["travel_limit"]:
        exceeded.append({"name": member["name"], "spent": total, "limit": budget["travel_limit"]})
print(json.dumps(exceeded))

Claude sees only the final JSON list of overspenders, reducing the token load from >200 KB of raw expense data to ~1 KB.

Measured benefits:

Token usage drops from 43,588 to 27,297 (‑37%).

Latency improves by eliminating >19 inference rounds in a 20‑tool workflow.

Accuracy gains: knowledge‑retrieval accuracy rises from 25.6% to 28.5%; GAIA benchmark improves from 46.5% to 51.2%.

How It Works

Mark tools with code_execution and set allowed_callers to the execution tool.

Claude generates Python code that calls the marked tools.

The code runs in a sandbox; each tool request includes a caller field linking back to the execution context.

Only the code’s final output is sent back to Claude.

{
  "tools": [
    {"type": "code_execution_20250825", "name": "code_execution"},
    {"name": "get_team_members", "description": "Get all members of a department...", "input_schema": {...}, "allowed_callers": ["code_execution_20250825"]},
    {"name": "get_expenses", ...},
    {"name": "get_budget_by_level", ...}
  ]
}

When to Use

Large datasets where only aggregated results are needed.

Workflows requiring three or more dependent tool calls.

Tasks that need filtering, sorting, or transformation before Claude sees the data.

Parallel operations across many endpoints.

When Not to Use

Simple single‑tool calls.

Tasks that require Claude to reason over every intermediate result.

Very fast, tiny lookups.

3. Tool Use Examples

Challenge

JSON Schema defines structural validity but cannot convey usage patterns, optional‑parameter conventions, or domain‑specific expectations, leading to malformed calls.

{
  "name": "create_ticket",
  "input_schema": {
    "properties": {
      "title": {"type": "string"},
      "priority": {"enum": ["low","medium","high","critical"]},
      "labels": {"type": "array", "items": {"type": "string"}},
      "reporter": {
        "type": "object",
        "properties": {
          "id": {"type": "string"},
          "name": {"type": "string"},
          "contact": {"type": "object", "properties": {"email": {"type": "string"}, "phone": {"type": "string"}}
        }
      },
      "due_date": {"type": "string"},
      "escalation": {"type": "object", "properties": {"level": {"type": "integer"}, "notify_manager": {"type": "boolean"}, "sla_hours": {"type": "integer"}}}
    },
    "required": ["title"]
  }
}

Solution

Tool Use Examples attach concrete input_examples to the definition, showing realistic calls and conventions.

{
  "name": "create_ticket",
  "input_schema": {...},
  "input_examples": [
    {
      "title": "Login page returns 500 error",
      "priority": "critical",
      "labels": ["bug","authentication","production"],
      "reporter": {"id": "USR-12345", "name": "Jane Smith", "contact": {"email": "[email protected]", "phone": "+1-555-0123"}},
      "due_date": "2024-11-06",
      "escalation": {"level": 2, "notify_manager": true, "sla_hours": 4}
    },
    {
      "title": "Add dark mode support",
      "labels": ["feature-request","ui"],
      "reporter": {"id": "USR-67890", "name": "Alex Chen"}
    },
    {
      "title": "Update API documentation"
    }
  ]
}

Internal tests show accuracy for handling complex parameters rises from 72% to 90% when examples are provided.

When to Use

Complex nested structures where JSON validity does not guarantee correct usage.

Tools with many optional parameters and domain‑specific conventions.

APIs where schema cannot capture subtle usage rules.

Similar tools that need disambiguation (e.g., create_ticket vs create_incident).

When Not to Use

Simple single‑parameter tools with obvious usage.

Standard formats already understood by Claude (URLs, emails).

Validation problems that JSON Schema can enforce.

Best Practices

Combine the three features strategically: start with the most pressing bottleneck, add others as needed, and keep the overall system lightweight.

Use Tool Search to avoid context bloat from massive tool libraries.

Apply Programmatic Tool Calling to eliminate intermediate result pollution.

Provide Tool Use Examples to guide correct parameter usage.

Example of a well‑crafted search definition:

{
  "name": "search_customer_orders",
  "description": "Search for customer orders by date range, status, or total amount. Returns order details including items, shipping, and payment info."
}

System prompt snippet to expose capabilities:

You have access to tools for Slack messaging, Google Drive file management, Jira issue tracking, and GitHub repository operations. Use tool search to find specific functionality.

Mark frequently used tools with defer_loading: false and set the rest to true to balance instant access and token savings.

For programmatic calling, document return formats clearly so Claude can generate correct parsers:

{
  "name": "get_orders",
  "description": "Retrieve orders for a customer.",
  "returns": "List of order objects with id (str), total (float), status (str), items (list of {sku, quantity, price}), created_at (ISO‑8601 timestamp)"
}

When crafting Tool Use Examples, include real‑world data, cover minimal, partial, and full parameter sets, and keep each tool to 1‑5 examples focused on ambiguous cases.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI agents Claude Context Engineering Tool Search programmatic calling tool use examples

Written by

AI Tech Publishing

In the fast-evolving AI era, we thoroughly explain stable technical foundations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.