Agentic RAG Deep Dive: Letting the Agent Decide When and How Often to Retrieve
The article analyzes the shortcomings of traditional one‑shot RAG pipelines, introduces four Agentic RAG patterns that let an LLM‑driven agent control retrieval strategy, source selection, query rewriting and retry limits, and provides concrete TypeScript implementations with LangGraph, code snippets, and practical pitfalls.
Why Traditional RAG Is a One‑Shot Bet
Standard RAG follows a fixed pipeline: user question → embedding → single vector search (top‑K) → prompt → generation. This design suffers from three fatal flaws: (1) a single retrieval with no fallback, so low‑quality results produce confident but wrong answers; (2) reliance on a single data source, creating knowledge silos when questions span documents, databases, and real‑time APIs; (3) inability to decompose complex queries that require multiple independent lookups. Production statistics show that 15‑30% of RAG failures stem from retrieval‑quality issues that the traditional architecture cannot detect or fix.
Four Agentic RAG Modes
Agentic RAG replaces the static pipeline with an LLM‑driven agent that decides retrieval actions at runtime. The four composable modes are:
Routing RAG : The agent interprets intent and routes the query to the appropriate backend (vector store, SQL database, or web search).
Multi‑step RAG : The agent breaks a complex question into sub‑questions, performs separate retrievals, and aggregates the results.
Corrective RAG (CRAG) : After retrieval, the agent scores the documents; if they are irrelevant or missing, it rewrites the query and retries, falling back to web search when needed.
Adaptive RAG : The agent first decides whether retrieval is necessary at all, avoiding unnecessary calls for questions that can be answered from context.
Each mode targets a specific pain point and can be combined in real projects.
CRAG in Practice: Adding a Scorer
The most engineering‑valuable pattern is CRAG, which adds a scoring node between retrieval and generation. The implementation uses LangGraph and TypeScript.
import { Annotation, StateGraph, END } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";
import { z } from "zod";
// Define graph state
const AgenticRAGState = Annotation.Root({
question: Annotation<string>(),
documents: Annotation<string[]>({
reducer: (prev, next) => [...prev, ...next],
default: () => [],
}),
generation: Annotation<string>(),
retrieval_grade: Annotation<"relevant" | "irrelevant" | "none">(),
retry_count: Annotation<number>({
reducer: (prev, next) => next,
default: () => 0,
}),
});
type RAGState = typeof AgenticRAGState.State;
const llm = new ChatOpenAI({ model: "gpt-4o-mini", temperature: 0 });
// Retrieval node
async function retrieveNode(state: RAGState) {
const results = await vectorStore.similaritySearch(state.question, 5);
return { documents: results.map(d => d.pageContent) };
}
// Scoring node
const GradeSchema = z.object({
score: z.enum(["relevant", "irrelevant", "none"]),
reason: z.string(),
});
async function gradeDocumentsNode(state: RAGState) {
const structuredLLM = llm.withStructuredOutput(GradeSchema);
const prompt = `You are a retrieval scorer.
User question: ${state.question}
Retrieved docs:
${state.documents.join("
---
")}
Evaluate relevance:
- relevant: highly relevant, can answer reliably
- irrelevant: weakly related but contains some info
- none: unrelated or empty
Provide a score and a short reason.`;
const result = await structuredLLM.invoke(prompt);
return { retrieval_grade: result.score };
}
// Query rewrite node
async function rewriteQueryNode(state: RAGState) {
const prompt = `Original question: ${state.question}
The retrieval results are poor. Rewrite the question with more keywords, less ambiguity, and no explanation.`;
const response = await llm.invoke(prompt);
return {
question: response.content as string,
documents: [],
retry_count: state.retry_count + 1,
};
}
// Generation node
async function generateNode(state: RAGState) {
const context = state.documents.join("
");
const prompt = `Answer the question based on the following documents.
Documents:
${context}
Question:
${state.question}
Provide a factual answer and state if the docs are insufficient.`;
const response = await llm.invoke(prompt);
return { generation: response.content as string };
}
// Routing after grading
const MAX_RETRY = 2;
function routeAfterGrading(state: RAGState): string {
if (state.retrieval_grade === "relevant") return "generate";
if (state.retry_count >= MAX_RETRY) return "generate"; // fallback after limit
if (state.retrieval_grade === "none") return "rewrite_query";
return "generate"; // "irrelevant" but still generate
}
// Assemble graph
const workflow = new StateGraph(AgenticRAGState)
.addNode("retrieve", retrieveNode)
.addNode("grade_documents", gradeDocumentsNode)
.addNode("rewrite_query", rewriteQueryNode)
.addNode("generate", generateNode)
.addEdge("__start__", "retrieve")
.addEdge("retrieve", "grade_documents")
.addConditionalEdges("grade_documents", routeAfterGrading, {
generate: "generate",
rewrite_query: "rewrite_query",
})
.addEdge("rewrite_query", "retrieve")
.addEdge("generate", END);
const app = workflow.compile();
// Example invocation
const result = await app.invoke({ question: "What is LangGraph's Checkpoint?" });
console.log("Final answer:", result.generation);This three‑step augmentation (retrieval → scoring → optional rewrite) gives RAG self‑correction capability.
Routing RAG: Letting the Agent Choose a “Library”
In Routing RAG each data source is wrapped as a tool that the LLM can invoke. The three example tools are:
import { tool } from "@langchain/core/tools";
import { z } from "zod";
// Vector store search tool
const searchDocsTool = tool(async ({ query }: { query: string }) => {
const results = await vectorStore.similaritySearch(query, 5);
return results.map(d => d.pageContent).join("
");
}, {
name: "search_docs",
description: "Search product docs, specs, and guides. Use for functional or procedural questions.",
schema: z.object({ query: z.string() }),
});
// SQL database tool
const queryDatabaseTool = tool(async ({ sql }: { sql: string }) => {
const result = await db.query(sql);
return JSON.stringify(result.rows);
}, {
name: "query_database",
description: "Query the business database. Use for revenue, user count, retention, etc. Only SELECT statements are allowed.",
schema: z.object({ sql: z.string().describe("Read‑only SELECT statement") }),
});
// Web search tool
const webSearchTool = tool(async ({ query }: { query: string }) => {
const results = await tavilySearch(query);
return results.map(r => r.content).join("
");
}, {
name: "web_search",
description: "Search the internet for up‑to‑date information, current events, or external market data.",
schema: z.object({ query: z.string() }),
});
// Bind tools to the LLM so it can route automatically
const agentWithTools = llm.bindTools([searchDocsTool, queryDatabaseTool, webSearchTool]);Clear tool descriptions are crucial; vague descriptions cause the model to over‑select the first tool.
Adaptive RAG: Deciding Whether to Retrieve
Adaptive RAG adds a routing decision before any retrieval. The agent classifies the question as one of three strategies and provides a reason.
const RouteSchema = z.object({
datasource: z.enum(["vectorstore", "web_search", "direct_answer"]),
reason: z.string(),
});
async function routeQuestion(state: RAGState) {
const structuredLLM = llm.withStructuredOutput(RouteSchema);
const prompt = `You are a question router. Choose the most efficient strategy.
Question: ${state.question}
Options:
- direct_answer: answer from existing context, no retrieval needed
- vectorstore: internal docs or product knowledge
- web_search: need real‑time or external info
Return the chosen option and a short justification.`;
return await structuredLLM.invoke(prompt);
}
function routeNode(state: RAGState & { routing_decision?: string }): string {
const decision = state.routing_decision;
if (decision === "direct_answer") return "generate";
if (decision === "web_search") return "web_search";
return "retrieve"; // default to vectorstore
}This mode is ideal for general assistants that handle both pure chat and knowledge‑base queries, saving retrieval cost when it is unnecessary.
Common Pitfalls
Missing max‑retry guard : Without a retry limit, a corrective loop can run forever, exhausting tokens.
Overly vague scoring prompt : LLMs tend to label everything "relevant". Provide concrete criteria for "irrelevant".
Parallel execution of dependent sub‑questions : If sub‑question B depends on A, running them in parallel breaks logical flow. Detect dependencies and serialize when needed.
Indistinguishable tool descriptions : Identical descriptions cause the router to favor the first tool. Emphasize unique use‑cases and negative examples.
Using large models for scoring : Scoring is cheap; a smaller model such as gpt‑4o‑mini with a strict Zod schema is more cost‑effective and stable.
Summary
Traditional RAG’s three fatal flaws are single‑shot retrieval, single data source, and lack of complex‑question decomposition.
Agentic RAG offers four patterns—Routing, Multi‑step, Corrective (CRAG), and Adaptive—each solving a specific limitation.
CRAG is the highest‑value entry point: add a scoring node, a query‑rewrite node, and a retry‑limit guard to achieve self‑correction.
Never omit a max‑retry counter; it is the safety net that prevents infinite loops.
Tool descriptions matter more than code for accurate routing; write clear applicability and counter‑examples.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
James' Growth Diary
I am James, focusing on AI Agent learning and growth. I continuously update two series: “AI Agent Mastery Path,” which systematically outlines core theories and practices of agents, and “Claude Code Design Philosophy,” which deeply analyzes the design thinking behind top AI tools. Helping you build a solid foundation in the AI era.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
