Why Your Agent Isn’t Stupid—It’s Just Lost in the Middle of the Context
Adding dozens of MCP tools overloads the LLM’s context window, causing the “lost in the middle” effect that degrades accuracy, but a gateway with semantic tool discovery, role‑based virtual servers, and pre‑filtering can restore performance while preserving governance.
Session Start Overhead
When a new MCP is added, the agent stalls, accuracy drops, and latency rises because the LLM reads the entire MCP manual before any user input, filling the context window with irrelevant tokens.
Token Overhead of MCPs
The official GitHub MCP ships with >90 tools. Each query loads ~50 000 tokens, consuming 30‑40% of a typical 128K‑token window before the user says a word. Adding Slack, Jira, SonarQube MCPs multiplies this cost.
Empirical observations of context size vs. model accuracy:
1 000–5 000 tokens → >90% accuracy
10 000–50 000 tokens → gradual decline
>500 000 tokens (danger zone) → 40–50% accuracy
MCP vs. CLI
CLI is simpler, faster, and cheaper, suitable for personal or local use. MCP provides permissions, monitoring, governance, and enterprise‑scale extensibility. They solve different problems rather than compete.
Semantic Tool Discovery
Keyword search matches literal words; semantic search matches intent. Using embeddings, a query such as “Summarize all open bugs and create a ticket for the most severe one” can be matched to the few relevant MCP endpoints.
Jira MCP
├── jira_search_issues "Search and filter Jira tickets by status and priority"
├── jira_create_issue "Create a new Bug ticket or task in Jira"
├── jira_update_issue "Update fields of an existing Jira ticket"
└── jira_get_project "Get project details and metadata"
GitHub MCP
├── github_list_issues "List open issues in a repository"
├── github_create_issue "Create a new Issue in a GitHub repo"
└── github_open_pr "Open a Pull Request between branches"
SonarQube MCP
├── sonarqube_get_issues "Retrieve bugs and security vulnerabilities detected by SonarQube"
└── sonarqube_get_code_smells "Get code smell and technical debt reports"
Confluence MCP
├── confluence_search_pages "Search pages in the Confluence knowledge base"
└── confluence_create_page "Create a new page in Confluence"
Slack MCP
├── slack_send_message "Send a message to a Slack channel or user"
└── slack_list_channels "List available Slack channels"
(≈50 MCPs, ≈200+ endpoints)Semantic search computes an embedding for each tool description and compares it to the intent embedding, returning only the four tools needed for the request, keeping the context concise.
Hybrid Search
Developer writes “create bug ticket” → vector search yields jira_create_issue Developer explicitly writes “jira_create_issue” → keyword match returns the same tool with a higher score
Both paths converge on the same tool.
Why the Gateway Is the Correct Place
An enterprise MCP registry solves visibility, access‑control, and monitoring problems. Each MCP must be registered, normalized, and approved before entering the environment.
When a new MCP is added, the pipeline automatically:
Computes embedding vectors from the tool’s name, description, and parameters.
Builds a combined semantic‑vector and keyword index.
At request time, retrieves the top 5–10 most relevant tools in milliseconds.
Thus the LLM never sees the remaining 200+ irrelevant tools.
Virtual MCP Servers
Role‑based virtual endpoints expose only the tools needed for a given role:
/virtual/devops-tools → GitHub, Jira, SonarQube
/virtual/support-tools → Confluence, Jira, SlackFull Observability
Every call, activated tool, latency, and cost is logged, traced, and metric‑ed. Integration with Langfuse or LangSmith provides end‑to‑end visibility of tool selection, payloads sent to the LLM, and returned results.
AI‑Centric RBAC
The gateway enforces who can activate which tool and how many times, supporting operation throttling (e.g., limiting repeated Wiki deletions).
Agent‑to‑Agent (A2A) Scenario
In tool → Agent → Agent flows, the same discovery problem exists. Registering Agent cards (name, description, skills, address) enables semantic discovery of the correct peer, with the gateway acting as the trust layer.
Claude Code Tool Search (April 2026)
“When your MCP tool descriptions occupy >10% of the context, Claude Code detects this and automatically switches to a lazy‑load mode instead of pre‑loading.”
Claude Code performs lazy loading: tools are loaded on demand after the request reaches the model. The gateway’s semantic filter pre‑filters tools before the LLM sees them, saving tokens and preventing loss of focus.
Paradigm Shift: From Hard‑Coding to Dynamic Discovery
Static loading hard‑codes all tools, causing a heavy context tax. Moving to gateway‑based semantic discovery frees developers from manual tool selection.
LiteLLM now supports MCP Toolsets and semantic discovery.
AWS AgentCore Registry includes built‑in hybrid search.
The model no longer needs to “know” a tool; it appears only when relevant.
Next Steps and Metrics
The context problem is moved to a more appropriate layer: the gateway decides which tools to show. Trust is placed in deterministic embedding vectors rather than stochastic model decisions.
Three key metrics indicate healthy operation:
Tool selection accuracy – correct tools selected / total activations.
Token efficiency – tokens sent after semantic filtering vs. static loading.
Retrieval latency – additional milliseconds added before the LLM call.
Without measuring these, the system is not a production‑ready semantic tool‑discovery system.
GitHub repository for the semantic gateway: https://github.com/codeninja/mcp-semantic-gateway
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AI Engineer Programming
In the AI era, defining problems is often more important than solving them; here we explore AI's contradictions, boundaries, and possibilities.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
