Unlocking Modern AI Application Architecture: From RAG to Agents and MCP
This article surveys the evolution of AI applications, explains large language model fundamentals, outlines architectural challenges, and introduces three core patterns—Retrieval‑Augmented Generation (RAG), autonomous Agents, and Model Context Protocol (MCP)—while providing practical LangChain code snippets and integration guidance.
Large Model Application Architecture Basics
Artificial intelligence applications have progressed through several pivotal stages, each marking a major shift in technical paradigms.
AI Application Evolution Overview
Large language models (LLMs) are now the core component of modern AI solutions, but they possess distinct technical characteristics and capability boundaries that must be understood for effective architecture design.
Large Language Model Fundamentals
LLMs serve as the central engine of AI applications; grasping their strengths and limits is essential for building robust systems.
AI Application Architecture Challenges
Despite their power, LLM‑based systems face multiple architectural hurdles, including knowledge staleness, hallucinations, domain‑specific depth, transparency, and private‑knowledge integration.
Emerging Architectural Patterns
These challenges have given rise to three complementary patterns—Retrieval‑Augmented Generation (RAG), Agent‑based decision‑execution, and Model Context Protocol (MCP)—which together form a modern AI application architecture that overcomes the native limitations of LLMs.
Modern AI Application Architecture Framework
The framework is multi‑layered and modular, consisting of the following key tiers:
Document Processing System
Embedding Model
Vector Store
Retrieval Augmentation System
Generative Model
Post‑processing System
Subsequent sections dive deeper into RAG, Agent, and MCP.
RAG
Basic Concept
RAG (Retrieval‑Augmented Generation) combines retrieval and generation, pulling relevant information from external knowledge bases to supplement LLM knowledge, thereby producing more accurate and up‑to‑date responses.
Problems Solved by RAG
Knowledge update: connects to real‑time external sources.
Model hallucination: provides factual grounding.
Domain expertise: accesses specialized corpora.
Transparency & traceability: reveals source documents.
Private knowledge: enables proprietary knowledge bases.
Core Components
Document Processing System : cleans, chunks, extracts metadata, and normalizes raw documents. Tools include LangChain loaders, LlamaIndex parsers, Unstructured, PyPDF2, NLTK, spaCy.
Embedding Model : converts text to dense vectors for semantic search. Options: OpenAI text‑embedding‑ada, Cohere Embed, BAAI/bge‑large, Jina embeddings.
Vector Store : stores vectors and provides similarity search. Options: Pinecone, Weaviate, Milvus, ChromaDB, FAISS, Qdrant.
Retrieval Augmentation System : query rewriting, hybrid retrieval, re‑ranking (HyDE, Cohere Rerank, semantic routing).
Generative Model : generates answers using retrieved context. Options: OpenAI GPT‑4, Anthropic Claude, Cohere Command, open‑source Mistral, Llama 3, DeepSeek.
Post‑processing System : fact‑checking, citation, formatting, hallucination detection, content filtering.
Building RAG Quickly with LangChain
LangChain simplifies connecting LLMs with external data sources and provides reusable components for the entire RAG pipeline.
<code><span>def read_word_document(file_path):
doc = docx.Document(file_path)
paragraphs = [para.text.strip() for para in doc.paragraphs if para.text.strip()]
return "\n".join(paragraphs) # merge paragraphs</span></code> <code><span>def split_text(text, chunk_size=100, chunk_overlap=10):
text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap, separators=["\n", "。", "?", "!"])
return text_splitter.split_text(text)</span></code> <code><span>def load_embedding_model(model_name="moka-ai/m3e-base"):
embeddings = HuggingFaceEmbeddings(model_name="moka-ai/m3e-base", model_kwargs={"device": "cpu"}, encode_kwargs={"normalize_embeddings": True})
return embeddings</span></code> <code><span>from langchain_community.vectorstores import FAISS
def store_to_vector_db(docs, db_path="faiss_index"):
embeddings = load_embedding_model()
vector_db = FAISS.from_documents(docs, embeddings)
vector_db.save_local(db_path)</span></code> <code><span>def search_similar_texts(query, vector_db, top_k=3):
results = vector_db.similarity_search(query, k=top_k)
return [r.page_content for r in results]</span></code> <code><span>def fulltext_search(query):
conn = sqlite3.connect(DB_FILE)
cursor = conn.cursor()
data = " OR ".join(jieba.cut(query))
cursor.execute("SELECT ori, bm25(documents) AS score FROM documents WHERE content MATCH ? ORDER BY score DESC LIMIT 3", (data,))
results = cursor.fetchall()
conn.close()
return [item[0] for item in results]
</span></code> <code><span>def query_knowledge_base(user_query, index_path="faiss_index"):
keyword_results = fulltext_search(user_query)
vector_results = search_word_vector(user_query, load_vector_db(index_path))
return vector_results + keyword_results # simple merge</span></code> <code><span>def generate(prompt: str, model: str = "deepseek-r1:1.5b", stream: bool = False) -> str:
url = "http://localhost:11434/api/generate"
payload = {"model": model, "prompt": prompt, "options": {"temperature": 0.7, "num_predict": 8192}}
response = requests.post(url, json=payload)
return response.json()["response"]
</span></code>Agent
Basic Concept
An Agent is an autonomous software entity that perceives its environment, makes decisions, and takes actions to achieve specific goals, often following a ReAct (Reason‑Act‑Observe) loop.
Core Components
Reasoning Engine (LLM)
Tool Set (APIs)
Memory (interaction history)
Planner (task decomposition)
Executor (tool invocation)
Observer (result parsing)
Prompt Templates
Feedback Loop (strategy adjustment)
Agent Execution Flow (ReAct)
The user query is interpreted by the LLM, the planner splits the task, the LLM generates action commands, the executor calls tools, the observer feeds results back, and the feedback loop refines the plan until a final answer is produced.
<code><span>def run(self, task: str) -> str:
# Memory: record task
self.memory.add_message("user", task)
# Planner: create plan
plan = self.planner.create_plan(task)
self.memory.save_state("plan", plan)
completed_steps = []
for step in plan:
step_id = step["step_id"]
description = step["description"]
tool_name = step.get("tool")
print(f"Executing step {step_id}: {description}")
if tool_name:
# LLM decides how to use the tool
system_msg = self.system_prompt.format(tools_description=self._format_tools_description())
messages = [{"role": "system", "content": system_msg}, {"role": "user", "content": f"Please help me with this step: {description}. Use a tool if needed."}]
response = self.llm_engine.generate(messages)
self.memory.add_message("assistant", response)
tool_calls = self._parse_tool_calls(response)
for tool_call in tool_calls:
try:
result = self.executor.execute_tool(tool_call["tool_name"], **tool_call["parameters"])
observation = self.observer.process_result(description, result)
step_result = {"step_id": step_id, "description": description, "tool_used": tool_call["tool_name"], "parameters": tool_call["parameters"], "result": result, "observation": observation}
completed_steps.append(step_result)
self.memory.add_message("system", f"Tool execution result: {result}")
# Feedback loop
remaining_steps = [s for s in plan if s["step_id"] not in [cs["step_id"] for cs in completed_steps]]
feedback = self.feedback_loop.evaluate_and_adjust(task, completed_steps, observation, remaining_steps)
if feedback.get("needs_adjust", False):
plan = [s for s in completed_steps] + feedback.get("new_plan", [])
self.memory.save_state("plan", plan)
print("Plan adjusted")
except Exception as e:
error_msg = f"Error executing step {step_id}: {str(e)}"
print(error_msg)
self.memory.add_message("system", error_msg)
else:
completed_steps.append({"step_id": step_id, "description": description, "completed": True})
# Final summary generation
summary_prompt = f"You helped the user complete the task: {task}. Completed steps:\n{json.dumps(completed_steps, ensure_ascii=False, indent=2)}\nProvide a concise summary."
summary = self.llm_engine.generate([{"role": "user", "content": summary_prompt}])
self.memory.add_message("assistant", summary)
return summary
</span></code>Model Context Protocol (MCP)
Basic Concept
MCP standardizes how LLMs interact with external data sources, services, and tools, enabling structured access to contextual information beyond the model's internal knowledge.
Core Components
MCP Host (runtime manager)
MCP Server (tool registration & request handling)
MCP Client (LLM integration layer)
Tool Provider (implements specific tools)
LLM Integration Layer
MCP vs. Function Call
Function Call focuses on generating structured parameters for predefined functions, while MCP provides a full ecosystem for dynamic tool discovery, registration, and execution.
Integrating MCP into AI Platforms (Example: Cursor)
Configure a custom MCP server in Cursor to call a GitHub API or a local weather‑forecast tool.
Building a Simple MCP Server (FastMCP)
<code><span># Initialize FastMCP server
mcp = FastMCP("weather")
@mcp.tool()
async def get_forecast(latitude: float, longitude: float) -> str:
"""Fetch weather forecast for a location"""
# mock implementation
return mock_data
@mcp.tool()
async def create_note(content: str) -> str:
"""Create a new note with the given content"""
subprocess.run(['open', '-a', 'Notes'])
applescript = f'''\
tell application "Notes"
activate
make new note at folder "Notes" with properties {{body:"{content}"}}
end tell
'''
subprocess.run(['osascript', '-e', applescript])
return "Note created successfully"
if __name__ == "__main__":
mcp.run(transport='stdio')
</span></code>HTTP‑Based MCP Example (FastAPI)
<code><span>app = FastAPI()
async def get_weather_data(location: str, date: str) -> Dict[str, Any]:
return generate_mock_weather(location, date)
@app.post("/api/weather")
async def get_weather(request: WeatherRequest):
data = await get_weather_data(request.location, request.date)
return {"status": "success", "data": data}
@app.get("/api/list_tools")
async def list_tools():
tools = [{
"name": "get_weather",
"description": "Get weather forecast for a specific location and date",
"endpoint": "/api/weather",
"method": "POST",
"params": [
{"name": "location", "type": "string", "description": "The location to get weather for"},
{"name": "date", "type": "string", "description": "The date to get weather for"}
]
}]
return {"status": "success", "tools": tools}
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
</span></code>Dynamic Tool Loading for LLMs
<code><span>async def fetch_tools_from_server() -> List[Dict[str, Any]]:
async with httpx.AsyncClient() as client:
response = await client.get(f"{MCP_SERVER_URL}/api/list_tools")
return response.json()["tools"]
def create_dynamic_tool_executor(tool_info: Dict[str, Any]) -> Callable:
async def execute_api_call(*args, **kwargs):
payload = {}
param_names = [p["name"] for p in tool_info["params"]]
for i, arg in enumerate(args):
if i < len(param_names):
payload[param_names[i]] = arg
for k, v in kwargs.items():
if k in param_names:
payload[k] = v
async with httpx.AsyncClient() as client:
is_post = tool_info["method"] == "POST"
method = client.post if is_post else client.get
response = await method(f"{MCP_SERVER_URL}{tool_info['endpoint']}", json=payload if is_post else None, params=None if is_post else payload)
if response.status_code == 200:
return response.json()["data"]
raise Exception(f"API call failed: {response.status_code} {response.text}")
def sync_executor(*args, **kwargs):
return asyncio.run(execute_api_call(*args, **kwargs))
sync_executor.__name__ = tool_info["name"]
sync_executor.__doc__ = tool_info["description"]
return sync_executor
def create_tools_from_server_data(tool_data: List[Dict[str, Any]]) -> List[Tool]:
tools = []
for info in tool_data:
executor = create_dynamic_tool_executor(info)
tools.append(Tool(name=info["name"], func=executor, description=info["description"]))
return tools
</span></code>Acknowledgments
The article reviews numerous open‑source AI projects—including LangChain, Ollama, Open WebUI, Dify, and others—that have significantly advanced the AI tooling ecosystem. Gratitude is extended to the developers and vibrant communities behind these contributions.
Didi Tech
Official Didi technology account
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.