Why Your Agent Isn’t Stupid—It’s Just Lost in the Middle of the Context

Adding dozens of MCP tools overloads the LLM’s context window, causing the “lost in the middle” effect that degrades accuracy, but a gateway with semantic tool discovery, role‑based virtual servers, and pre‑filtering can restore performance while preserving governance.

AI Engineer Programming
AI Engineer Programming
AI Engineer Programming
Why Your Agent Isn’t Stupid—It’s Just Lost in the Middle of the Context

Session Start Overhead

When a new MCP is added, the agent stalls, accuracy drops, and latency rises because the LLM reads the entire MCP manual before any user input, filling the context window with irrelevant tokens.

Token Overhead of MCPs

The official GitHub MCP ships with >90 tools. Each query loads ~50 000 tokens, consuming 30‑40% of a typical 128K‑token window before the user says a word. Adding Slack, Jira, SonarQube MCPs multiplies this cost.

Empirical observations of context size vs. model accuracy:

1 000–5 000 tokens → >90% accuracy

10 000–50 000 tokens → gradual decline

>500 000 tokens (danger zone) → 40–50% accuracy

MCP vs. CLI

CLI is simpler, faster, and cheaper, suitable for personal or local use. MCP provides permissions, monitoring, governance, and enterprise‑scale extensibility. They solve different problems rather than compete.

Semantic Tool Discovery

Keyword search matches literal words; semantic search matches intent. Using embeddings, a query such as “Summarize all open bugs and create a ticket for the most severe one” can be matched to the few relevant MCP endpoints.

Jira MCP
├── jira_search_issues   "Search and filter Jira tickets by status and priority"
├── jira_create_issue    "Create a new Bug ticket or task in Jira"
├── jira_update_issue    "Update fields of an existing Jira ticket"
└── jira_get_project     "Get project details and metadata"

GitHub MCP
├── github_list_issues   "List open issues in a repository"
├── github_create_issue "Create a new Issue in a GitHub repo"
└── github_open_pr       "Open a Pull Request between branches"

SonarQube MCP
├── sonarqube_get_issues      "Retrieve bugs and security vulnerabilities detected by SonarQube"
└── sonarqube_get_code_smells "Get code smell and technical debt reports"

Confluence MCP
├── confluence_search_pages "Search pages in the Confluence knowledge base"
└── confluence_create_page "Create a new page in Confluence"

Slack MCP
├── slack_send_message "Send a message to a Slack channel or user"
└── slack_list_channels "List available Slack channels"

(≈50 MCPs, ≈200+ endpoints)

Semantic search computes an embedding for each tool description and compares it to the intent embedding, returning only the four tools needed for the request, keeping the context concise.

Hybrid Search

Developer writes “create bug ticket” → vector search yields jira_create_issue Developer explicitly writes “jira_create_issue” → keyword match returns the same tool with a higher score

Both paths converge on the same tool.

Why the Gateway Is the Correct Place

An enterprise MCP registry solves visibility, access‑control, and monitoring problems. Each MCP must be registered, normalized, and approved before entering the environment.

When a new MCP is added, the pipeline automatically:

Computes embedding vectors from the tool’s name, description, and parameters.

Builds a combined semantic‑vector and keyword index.

At request time, retrieves the top 5–10 most relevant tools in milliseconds.

Thus the LLM never sees the remaining 200+ irrelevant tools.

Virtual MCP Servers

Role‑based virtual endpoints expose only the tools needed for a given role:

/virtual/devops-tools   → GitHub, Jira, SonarQube
/virtual/support-tools  → Confluence, Jira, Slack

Full Observability

Every call, activated tool, latency, and cost is logged, traced, and metric‑ed. Integration with Langfuse or LangSmith provides end‑to‑end visibility of tool selection, payloads sent to the LLM, and returned results.

AI‑Centric RBAC

The gateway enforces who can activate which tool and how many times, supporting operation throttling (e.g., limiting repeated Wiki deletions).

Agent‑to‑Agent (A2A) Scenario

In tool → Agent → Agent flows, the same discovery problem exists. Registering Agent cards (name, description, skills, address) enables semantic discovery of the correct peer, with the gateway acting as the trust layer.

Claude Code Tool Search (April 2026)

“When your MCP tool descriptions occupy >10% of the context, Claude Code detects this and automatically switches to a lazy‑load mode instead of pre‑loading.”

Claude Code performs lazy loading: tools are loaded on demand after the request reaches the model. The gateway’s semantic filter pre‑filters tools before the LLM sees them, saving tokens and preventing loss of focus.

Paradigm Shift: From Hard‑Coding to Dynamic Discovery

Static loading hard‑codes all tools, causing a heavy context tax. Moving to gateway‑based semantic discovery frees developers from manual tool selection.

LiteLLM now supports MCP Toolsets and semantic discovery.

AWS AgentCore Registry includes built‑in hybrid search.

The model no longer needs to “know” a tool; it appears only when relevant.

Next Steps and Metrics

The context problem is moved to a more appropriate layer: the gateway decides which tools to show. Trust is placed in deterministic embedding vectors rather than stochastic model decisions.

Three key metrics indicate healthy operation:

Tool selection accuracy – correct tools selected / total activations.

Token efficiency – tokens sent after semantic filtering vs. static loading.

Retrieval latency – additional milliseconds added before the LLM call.

Without measuring these, the system is not a production‑ready semantic tool‑discovery system.

GitHub repository for the semantic gateway: https://github.com/codeninja/mcp-semantic-gateway

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LLMMCPgatewaysemantic searchAgent ArchitectureContext Window
AI Engineer Programming
Written by

AI Engineer Programming

In the AI era, defining problems is often more important than solving them; here we explore AI's contradictions, boundaries, and possibilities.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.