How to Build a High‑Performance Enterprise RAG System with Model Context Protocol (MCP)
This article presents a step‑by‑step guide for constructing a scalable enterprise Retrieval‑Augmented Generation (RAG) solution using the Model Context Protocol (MCP), covering architecture comparison, system design, Milvus‑backed knowledge store, Python client implementation, deployment scripts, code examples, and best‑practice recommendations.
In the wave of enterprise digital transformation, managing internal knowledge assets efficiently has become a critical challenge. Large language models (LLM) enable Retrieval‑Augmented Generation (RAG) to bridge corporate knowledge with AI capabilities, but traditional RAG suffers from poor retrieval quality and difficult real‑time updates.
MCP vs. Traditional RAG
Limitations of Traditional RAG
Tightly Coupled Architecture : Retrieval logic and LLM calls are intertwined, making independent optimization hard.
Single Retrieval Strategy : Usually only vector search is used, lacking hybrid approaches.
Lack of Standardized Interfaces : Different implementations expose divergent APIs, preventing reuse.
High Maintenance Cost : System upgrades require extensive code changes.
Advantages of MCP‑Based Solution
Standardized Tool Calls : MCP defines a unified interface, reducing integration effort.
Decoupled Design : Model invocation is separated from business logic, enabling independent upgrades.
Flexible Extensibility : New data sources and modules (e.g., hybrid search, multimodal content) can be added easily.
Engineering‑Friendly : Aligns with software‑engineering best practices for team collaboration.
Tool‑Driven Implementation : All functionality (knowledge ingestion, retrieval, FAQ handling) is realized through prompts and LLM‑driven tool calls.
Project Background and Requirements
Modern enterprises face four main knowledge‑management pain points:
Knowledge Fragmentation : Documents are scattered across systems without a unified search entry.
Low Retrieval Efficiency : Keyword search cannot understand semantics, leading to inaccurate results.
Slow Knowledge Updates : Manual curation delays the reflection of the latest information.
High Usage Barrier : Technical jargon and complex query syntax hinder ordinary employees.
To address these issues, the system must satisfy four core requirements:
Intelligent Retrieval : Natural‑language queries that understand intent and context.
Automated Knowledge Processing : Automatic document chunking and FAQ extraction.
Flexible Expansion : Support for multiple data sources and model integrations.
Easy Deployment & Maintenance : Simple architecture that teams can quickly adopt and iterate.
Project Goals
Technical Goals
Build MCP‑compliant knowledge‑store service and client.
Implement document chunking, FAQ extraction, and vector embedding.
Support complex query decomposition and hybrid retrieval.
Application Goals
Provide a unified knowledge‑base management and retrieval portal.
Achieve >90% retrieval accuracy for internal queries.
Reduce knowledge‑base maintenance workload by 70%.
Enable intelligent processing of all corporate documents.
System Design and Implementation
The design references alibabacloud-tablestore-mcp-server , which uses Tablestore and Java. For better extensibility, the implementation switches to Milvus for vector storage and rewrites the server and client in Python.
The MCP‑based RAG system consists of three core components:
Knowledge‑Store Service (MCP Server) : Backend built on Milvus, responsible for document storage and vector retrieval.
MCP Client : Communicates with the server to perform knowledge ingestion and query operations.
LLM Integration : Handles document chunking, FAQ extraction, query decomposition, and answer generation.
Deployment of MCP Server
Prerequisites: Docker, Docker‑Compose, at least 4 CPU, 4 GB RAM, 20 GB disk.
# Enter project directory
cd mcp-rag
# Start Milvus and dependencies
docker compose up -d etcd minio standalone
# Create Python virtual environment
python -m venv env-mcp-rag
source env-mcp-rag/bin/activate
# Install dependencies
pip install -r requirements.txt
# Launch the server
python -m app.mainMCP Server Core API
The server exposes four tools:
storeKnowledge : Store raw documents into the knowledge store.
searchKnowledge : Perform similarity search on stored documents.
storeFAQ : Save extracted FAQ pairs into a dedicated FAQ store.
searchFAQ : Retrieve relevant FAQ entries.
Example implementation of storeKnowledge:
async def store_knowledge(self, content: str, metadata: Dict[str, Any] = None) -> Dict[str, Any]:
"""Store knowledge content to Milvus"""
await self.ready_for_connections()
try:
knowledge_content = KnowledgeContent(content=content, metadata=metadata or {})
self.milvus_service.store_knowledge(knowledge_content)
return {"status": "success", "message": "Knowledge stored successfully"}
except Exception as e:
logger.error(f"Error storing knowledge: {e}")
return {"status": "error", "message": str(e)}RAG Client Implementation (MCP Client)
Key steps:
Knowledge‑Base Construction
Text chunking – ensure semantic completeness.
FAQ extraction – generate question‑answer pairs via LLM.
Vectorization – embed chunks and FAQs and store them in Milvus.
Text chunking code (excerpt):
def _chunk_text(self, text: str) -> List[str]:
"""Split text into chunks while preserving semantics"""
chunks = []
if len(text) <= self.chunk_size:
chunks.append(text)
return chunks
start = 0
while start < len(text):
end = start + self.chunk_size
if end < len(text):
sentence_end = max(
text.rfind('. ', start, end),
text.rfind('? ', start, end),
text.rfind('! ', start, end)
)
if sentence_end > start:
end = sentence_end + 1
chunks.append(text[start:min(end, len(text))])
start = end - self.chunk_overlap
if start >= len(text) or start <= 0:
break
return chunksFAQ extraction prompt (simplified):
system_prompt = """You are a knowledge‑extraction expert. Extract up to 10 FAQ items from the given text. Output a JSON array with \"question\" and \"answer\" fields only."""
user_prompt = f"""Extract FAQs from the following text:
```
{text}
```"""
response = self.llm_client.sync_generate(prompt=user_prompt, system_prompt=system_prompt, temperature=0.3)Question decomposition code:
async def _decompose_question(self, question: str) -> List[str]:
"""Break a complex question into simpler sub‑questions"""
system_prompt = """You are a question‑analysis expert. Split the input into 2‑4 clear sub‑questions covering all aspects. Return a JSON array like [\"sub‑question1\", \"sub‑question2\"]."""
user_prompt = f"""Decompose the following question:
{question}"""
response = self.llm_client.sync_generate(prompt=user_prompt, system_prompt=system_prompt, temperature=0.3)
# Parse JSON array from response
...Context filtering (simplified):
async def _filter_context(self, question: str, context_items: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"""Select the most relevant context items for the final answer"""
seen = set()
filtered = []
faq_items = [i for i in context_items if i["type"] == "faq"]
knowledge_items = [i for i in context_items if i["type"] == "knowledge"]
for item in faq_items + knowledge_items:
content = item.get("content")
if content and content not in seen:
seen.add(content)
filtered.append(item)
if len(filtered) >= 6:
break
return filteredPractical Demonstration
Build the knowledge base from a markdown file:
python -m app.main build --file test.md --title "RAG Basics" --author "Enterprise KB" --tags "LLM,RAG,KnowledgeBase"Sample log output:
2025-05-11 14:50:16 | INFO | app.knowledge_builder:build_from_text:52 - Split text into 2 chunks
2025-05-11 14:50:59 | INFO | app.knowledge_builder:build_from_text:72 - Extracted 8 FAQs from text
2025-05-11 14:51:00 | INFO | __main__:build_knowledge_base:48 - Stored 2/2 chunks to knowledge base
2025-05-11 14:51:00 | INFO | __main__:build_knowledge_base:50 - Extracted and stored 8 FAQsQuery the system:
python -m app.main query --question "What advantages and drawbacks does RAG have compared to traditional enterprise knowledge bases?"Result excerpt:
2025-05-11 15:01:46 | INFO | app.knowledge_retriever:query:39 - Decomposed question into 4 sub‑questions
2025-05-11 15:01:47 | INFO | app.knowledge_retriever:query:67 - Filtered 28 context items to 6
================================================================================
Question: RAG相比企业传统的知识库有什么优势和缺点
--------------------------------------------------------------------------------
Answer: Retrieval‑Augmented Generation (RAG) allows LLMs to dynamically access up‑to‑date internal knowledge, improving relevance, accuracy, and utility while keeping the model lightweight. It also introduces challenges such as system complexity, latency, and the need for robust retrieval pipelines.
================================================================================Implementation Recommendations & Best Practices
Document Processing Strategy
Set chunk size to 1000‑1500 characters with 200‑300 character overlap.
Adjust chunking rules for technical vs. narrative documents.
Preserve original metadata (e.g., source, format) to improve retrieval precision.
Retrieval Optimization Techniques
Employ hybrid search (semantic vectors + keyword matching).
Generate 2‑4 sub‑questions during decomposition.
Limit total context items to 5‑8 to avoid information overload.
System Integration Tips
Choose an appropriate embedding model for the domain.
Design incremental indexing for real‑time knowledge updates.
Enable monitoring and logging to quickly detect failures.
Conclusion & Outlook
Using MCP to build a RAG system resolves many pain points of traditional pipelines—tight coupling, single‑strategy retrieval, and high maintenance cost—while offering a standardized, extensible framework for enterprise knowledge management. Future directions include multimodal content support, real‑time knowledge sync mechanisms, and adaptive retrieval tuned by user feedback.
References
Model Context Protocol (MCP) official documentation – https://modelcontextprotocol.io/introduction
Milvus vector database visual client Attu – https://milvus.io/docs/zh/quickstart_with_attu.md
MCP‑RAG practical code repository – https://github.com/FlyAIBox/mcp-in-action/tree/rag_0.1.1/mcp-rag
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
