How Claude Code 2.1.69 Cuts Context Usage by 95% with Lazy‑Loaded Tool Search
Claude Code 2.1.69 introduces a lazy‑loaded Tool Search that defers loading of Model Context Protocol (MCP) tools until needed, dropping system‑tool context from around 10% to zero, dramatically reducing token consumption and improving tool selection accuracy.
What changed in version 2.1.69?
Claude Code 2.1.69 introduces Tool Search, which defers loading of tool definitions until they are needed, reducing system‑tool context from ~10% to 0%.
Why the previous approach was problematic
Earlier versions loaded all MCP tool definitions at startup. With many integrations (GitHub, Slack, Jira, etc.) the tool payload could exceed 30 K tokens, consuming most of the context window and causing slower responses and selection errors.
Example 3‑server setup: GitHub 35 tools (~26 K tokens), Context7 2 tools (~3 K), Exa 2 tools (~2 K) → total ~31 K tokens.
Adding Slack, Jira, Linear can push the total beyond 100 K tokens.
Tool Search mechanism
Two search strategies are offered:
1. Regex search
Claude generates a Python regular expression that matches tool names, descriptions, and parameters. Examples: "weather" – find weather‑related tools "get_.*_data" – tools starting with “get” and ending with “data” "(?i)slack" – case‑insensitive Slack tools
2. BM25 search
The model receives a natural‑language query and ranks tools using the BM25 algorithm. Example queries: “send a message to a user”, “create a pull request on GitHub”, “search customer orders by date”.
Both methods return the top 3‑5 most relevant tools, which are then fully expanded; the rest remain unloaded.
Impact on token usage
Traditional loading consumes about 77 K tokens (tool definitions 72 K + system prompt 5 K). Tool Search reduces this to roughly 8.7 K tokens (search 0.5 K + discovered tools 3 K + system prompt 5 K), a 95 % saving.
defer_loading flag
Adding defer_loading: true to a tool definition prevents it from loading at startup. Example JSON snippet:
{
"name": "github-create-pr",
"description": "Create a pull request on GitHub",
"input_schema": { ... },
"defer_loading": true
}Best practice: keep 3‑5 frequently used tools with defer_loading: false (or omit the flag) and set defer_loading: true for all others.
How to enable
If you are running Claude Code 2.1.69 or later, Tool Search is enabled automatically. The system detects when MCP tools exceed 10 % of the context window and switches to lazy loading without any configuration changes.
Takeaway
Anthropic’s engineering shows that efficient context management, not merely larger windows, drives performance. Tool Search embodies this principle by loading only the tools an AI actually needs.
Java Architecture Diary
Committed to sharing original, high‑quality technical articles; no fluff or promotional content.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
