How Claude Code 2.1.69 Cuts Context Usage by 95% with Lazy‑Loaded Tool Search

Claude Code 2.1.69 introduces a lazy‑loaded Tool Search that defers loading of Model Context Protocol (MCP) tools until needed, dropping system‑tool context from around 10% to zero, dramatically reducing token consumption and improving tool selection accuracy.

Java Architecture Diary
Java Architecture Diary
Java Architecture Diary
How Claude Code 2.1.69 Cuts Context Usage by 95% with Lazy‑Loaded Tool Search

What changed in version 2.1.69?

Claude Code 2.1.69 introduces Tool Search, which defers loading of tool definitions until they are needed, reducing system‑tool context from ~10% to 0%.

Why the previous approach was problematic

Earlier versions loaded all MCP tool definitions at startup. With many integrations (GitHub, Slack, Jira, etc.) the tool payload could exceed 30 K tokens, consuming most of the context window and causing slower responses and selection errors.

Example 3‑server setup: GitHub 35 tools (~26 K tokens), Context7 2 tools (~3 K), Exa 2 tools (~2 K) → total ~31 K tokens.

Adding Slack, Jira, Linear can push the total beyond 100 K tokens.

Tool Search mechanism

Two search strategies are offered:

1. Regex search

Claude generates a Python regular expression that matches tool names, descriptions, and parameters. Examples: "weather" – find weather‑related tools "get_.*_data" – tools starting with “get” and ending with “data” "(?i)slack" – case‑insensitive Slack tools

2. BM25 search

The model receives a natural‑language query and ranks tools using the BM25 algorithm. Example queries: “send a message to a user”, “create a pull request on GitHub”, “search customer orders by date”.

Both methods return the top 3‑5 most relevant tools, which are then fully expanded; the rest remain unloaded.

Impact on token usage

Traditional loading consumes about 77 K tokens (tool definitions 72 K + system prompt 5 K). Tool Search reduces this to roughly 8.7 K tokens (search 0.5 K + discovered tools 3 K + system prompt 5 K), a 95 % saving.

defer_loading flag

Adding defer_loading: true to a tool definition prevents it from loading at startup. Example JSON snippet:

{
  "name": "github-create-pr",
  "description": "Create a pull request on GitHub",
  "input_schema": { ... },
  "defer_loading": true
}

Best practice: keep 3‑5 frequently used tools with defer_loading: false (or omit the flag) and set defer_loading: true for all others.

How to enable

If you are running Claude Code 2.1.69 or later, Tool Search is enabled automatically. The system detects when MCP tools exceed 10 % of the context window and switches to lazy loading without any configuration changes.

Takeaway

Anthropic’s engineering shows that efficient context management, not merely larger windows, drives performance. Tool Search embodies this principle by loading only the tools an AI actually needs.

MCPLazy Loadingcontext managementClaude Codetool search
Java Architecture Diary
Written by

Java Architecture Diary

Committed to sharing original, high‑quality technical articles; no fluff or promotional content.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.