Artificial Intelligence 18 min read

MCP Integration Deep Dive: Prompt Cache Stability and Tool Ordering Explained

The article analyzes why connecting an MCP server can triple response latency and token usage, explains how unstable tool ordering breaks Anthropic's prompt cache, and provides detailed code walkthroughs, design insights, common pitfalls, and concrete best‑practice recommendations for building reliable MCP integrations.

James' Growth Diary

May 16, 2026

MCP Integration Deep Dive: Prompt Cache Stability and Tool Ordering Explained

01 Why does connecting an MCP Server triple the response time?

When the author added an MCP server, the response time grew from 30 s to 90 s and token consumption jumped from ~20 k to ~150 k per round. The root cause was identified as tool ordering : the server returned the tool list in a nondeterministic order, breaking Anthropic's prompt‑cache mechanism.

Anthropic charges 1/10 of the normal rate when the request prefix matches the previous request byte‑for‑byte. Claude Code builds each request as [tools list] → [system prompt] → [messages history]. If any byte in the tools section changes, the cache is missed.

tools list changes → cache invalidated

static part of system prompt changes → cache invalidated

only messages appended → cache likely hit

The MCP server shuffled the tool list on every start, so the tools byte sequence never stayed the same.

02 assembleToolPool: how the tool pool is built

The function assembleToolPool (found in src/services/mcp/ and src/tools.ts) merges built‑in tools with MCP tools:

async function assembleToolPool(config, mcpClients, opts) {
  // 1. Fixed built‑in tools (order fixed at compile time)
  const builtinTools = getAllBaseTools(config, opts)

  // 2. Gather tools from each MCP server (order depends on server response)
  const mcpTools = []
  for (const [serverName, client] of Object.entries(mcpClients)) {
    const { tools } = await client.listTools() // order may differ each call!
    for (const tool of tools) {
      mcpTools.push(wrapMcpTool(serverName, tool))
    }
  }

  // 3. Concatenate – built‑ins first, MCP tools after (no sorting!)
  return [...builtinTools, ...mcpTools]
}

Because mcpTools is not sorted, any variation in the server‑provided order breaks the cache.

03 SYSTEM_PROMPT_DYNAMIC_BOUNDARY: the cache watershed

The constant SYSTEM_PROMPT_DYNAMIC_BOUNDARY separates static and dynamic sections of the system prompt. Static parts ( CORE_RULES, TOOL_DESCRIPTIONS, SAFETY_RULES) are shared across all sessions and cached globally. Dynamic parts (project‑specific claudeMd, memoryIndex, skillsSection, mcpState) are cached per session.

Consequences for MCP:

Tool description text lives in staticParts because it is stable.

Connection state lives in dynamicParts and only affects session‑level cache.

04 Fourteen cache‑break vectors and sticky‑latch semantics

The module promptCacheBreakDetection.ts defines 14 vectors such as tool_list_mutated, mcp_server_registered, system_prompt_dynamic_section, etc. Each vector has a StickyLatch that, once tripped, remains true for the rest of the session, preventing automatic cache recovery.

This explains why a session may start fast and become slower: an early event (e.g., loading a sub‑directory) flips a latch, and every subsequent request carries the “cache broken” flag.

05 Wrapping MCP tools: from JSON‑RPC to Tool<Input, Output>

function wrapMcpTool(serverName, mcpTool) {
  return buildTool({
    name: `mcp__${serverName}__${mcpTool.name}`,
    description: mcpTool.description ?? "",
    inputSchema: mcpTool.inputSchema,
    async execute(input, context) {
      const client = getMcpClient(serverName)
      if (!client) throw new Error(`MCP server ${serverName} not connected`)
      const result = await client.callTool({
        name: mcpTool.name,
        arguments: input as Record<string, unknown>
      })
      return formatMcpResult(result)
    },
    isReadOnly: false, // conservative default: assume write
    requiresApproval: true // unknown source needs user confirmation
  })
}

The mcp__serverName__ prefix provides namespace isolation and makes logs instantly reveal which calls originate from MCP.

06 Connection lifecycle: start, reconnect, and tool refresh

class McpConnectionManager {
  private clients = new Map<string, McpClient>()
  private toolCache = new Map<string, Tool[]>()

  async connectAll(config) {
    const entries = Object.entries(config.mcpServers)
    await Promise.allSettled(
      entries.map(([name, serverConfig]) => this.connectOne(name, serverConfig))
    )
  }

  async connectOne(name, config) {
    try {
      const client = await createMcpClient(config)
      await client.connect()
      const { tools } = await client.listTools()
      const sortedTools = tools.sort((a, b) => a.name.localeCompare(b.name))
      this.toolCache.set(name, sortedTools.map(t => wrapMcpTool(name, t)))
      this.clients.set(name, client)
    } catch (err) {
      logger.warn(`MCP server ${name} failed to connect: ${err}`)
    }
  }

  async refreshTools(serverName) {
    const client = this.clients.get(serverName)
    if (!client) return
    const { tools } = await client.listTools()
    const sortedTools = tools.sort((a, b) => a.name.localeCompare(b.name))
    this.toolCache.set(serverName, sortedTools.map(t => wrapMcpTool(serverName, t)))
    cacheBreakDetection.trip("tool_list_mutated", { serverName, reason: "manual tool refresh" })
  }
}

07 Common pitfalls after adding MCP

Pitfall 1: Tool list order changes on each start → token usage spikes. Fix: sort the list alphabetically before returning.

Pitfall 2: Reconnecting to an MCP server mid‑session flips mcp_server_registered latch → permanent cache break. Fix: avoid mid‑session reconnections or pre‑connect and keep alive.

Pitfall 3: Tool description contains dynamic data (timestamps, versions) → cache miss even with stable order. Fix: keep description static.

Pitfall 4: serverName is generated dynamically (e.g., UUID) → name changes break cache. Fix: use a stable, semantic key such as my-db-tools.

Pitfall 5: MCP server fails to connect and the code does not handle the rejected promise → many tools become unavailable. Fix: ensure failed servers return an empty tool list instead of throwing.

08 Design insights

Insight 1: Stable prefix bytes are the core cost constraint for agents; tools occupy 20‑30 k tokens per round, so cache hits save 90 % of cost.

Insight 2: Fail‑closed philosophy extends to MCP tools: they default to requiresApproval: true, prompting the user rather than silently executing unknown actions.

Insight 3: Sticky‑latch behaves like a “honest pessimism” – once a cache‑break vector trips, the session never pretends the cache could recover.

Insight 4: The mcp__serverName__toolName naming convention provides built‑in observability; logs instantly differentiate MCP tools from built‑ins.

09 Critical perspective: limits of the current design

Tool pool is assembled once at startup; no hot‑update capability. Frequent MCP changes force full refreshes, breaking cache each time.

Cross‑server tool dependencies lack declarative management; developers must manually order calls.

Server health is opaque – failures only emit a warning, silently removing tools and making agent behavior hard to debug.

The tool_list_mutated vector is coarse‑grained; any change triggers full cache invalidation. Incremental invalidation would require API support.

10 Practical checklist for building an MCP server

Key rules for cache stability:

Use a stable, semantic server key (affects tool name prefix).

Return the tool list sorted alphabetically.

Keep tool description as pure static text.

Handle connection errors by returning an empty list, not by throwing.

Limit total tools to around 20 to keep token usage reasonable.

// Correct MCP server definition
server.setRequestHandler(ListToolsRequestSchema, async () => {
  return {
    tools: [
      { name: "query_database", description: "Query the database with SQL", inputSchema: {/* stable JSON schema */} },
      { name: "write_file", description: "Write content to a file", inputSchema: {/* stable JSON schema */} }
    ].sort((a, b) => a.name.localeCompare(b.name)) // ✅ fixed ordering
  }
})

// Incorrect example – unordered and dynamic description
server.setRequestHandler(ListToolsRequestSchema, async () => {
  const tools = await db.getTools() // order not guaranteed
  return {
    tools: tools.map(t => ({
      name: t.name,
      description: `Tool v${t.version} - updated at ${new Date()}`,
      inputSchema: t.schema
    })) // ❌ dynamic description, no sorting
  }
})

Production‑grade configuration should use a stable key and static arguments, e.g.:

{
  "mcpServers": {
    "stable-name-here": {
      "command": "node",
      "args": ["/path/to/mcp-server/index.js"],
      "env": { "NODE_ENV": "production" }
    }
  }
}

Summary

Prompt cache hinges on an immutable request prefix; any change in the tools list invalidates the cache. SYSTEM_PROMPT_DYNAMIC_BOUNDARY separates globally cached static rules from per‑session dynamic data.

Fourteen sticky‑latch vectors, once triggered, permanently break cache for the session.

Namespaced tool prefixes ( mcp__server__tool) improve observability.

Fail‑closed defaults keep agents safe by requiring user approval for external tools.

Next article will explore the Hooks system ( stopHooks) and how Claude Code self‑corrects without crashing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

MCP best practices Model Context Protocol Claude Code prompt cache AI agent design sticky latch tool ordering

Written by

James' Growth Diary

I am James, focusing on AI Agent learning and growth. I continuously update two series: “AI Agent Mastery Path,” which systematically outlines core theories and practices of agents, and “Claude Code Design Philosophy,” which deeply analyzes the design thinking behind top AI tools. Helping you build a solid foundation in the AI era.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

01 Why does connecting an MCP Server triple the response time?

02 assembleToolPool: how the tool pool is built

03 SYSTEM_PROMPT_DYNAMIC_BOUNDARY: the cache watershed

04 Fourteen cache‑break vectors and sticky‑latch semantics

05 Wrapping MCP tools: from JSON‑RPC to Tool&lt;Input, Output&gt;

06 Connection lifecycle: start, reconnect, and tool refresh

07 Common pitfalls after adding MCP

08 Design insights

09 Critical perspective: limits of the current design

10 Practical checklist for building an MCP server

Summary

James' Growth Diary

How this landed with the community

Was this worth your time?

0 Comments

05 Wrapping MCP tools: from JSON‑RPC to Tool<Input, Output>