Building Agentic RAG with LlamaIndex: From Tool Agents to a Top Agent
This article walks through the design and implementation of an Agentic Retrieval‑Augmented Generation system using LlamaIndex, showing how to wrap multiple RAG engines as tools, orchestrate them with hierarchical AI agents, and scale the solution with tool retrieval for large document collections.
Background and Motivation
Classic Retrieval‑Augmented Generation (RAG) pipelines retrieve relevant chunks from vector stores and feed them to a large language model (LLM) for answer synthesis. While effective for simple fact‑finding, they struggle with enterprise scenarios that require global document understanding, cross‑document comparison, or integration with external tools.
Agentic RAG Concept
Agentic RAG introduces AI agents that plan and coordinate multiple specialized RAG engines (tools). A Tool Agent encapsulates a set of RAG engines for a single document or knowledge base, while a Top Agent manages all Tool Agents, selecting and invoking the appropriate tools to answer complex queries.
Implementation with LlamaIndex
The following steps demonstrate a concrete implementation using LlamaIndex (compatible with LangChain).
1. Prepare Test Documents
names = ['c-rag','self-rag','kg-rag']
files = ['../../data/c-rag.pdf','../../data/self-rag.pdf','../../data/kg-rag.pdf']2. Create a Tool Agent for a Single Document
def create_tool_agent(file, name):
print(f'Starting to create tool agent for 【{name}】...')
docs = SimpleDirectoryReader(input_files=[file]).load_data()
splitter = SentenceSplitter(chunk_size=500, chunk_overlap=50)
nodes = splitter.get_nodes_from_documents(docs)
# Vector index (persistent)
if not os.path.exists(f"./storage/{name}"):
print('Creating vector index...')
storage_context = StorageContext.from_defaults(vector_store=vector_store)
vector_index = VectorStoreIndex(nodes, storage_context=storage_context)
vector_index.storage_context.persist(persist_dir=f"./storage/{name}")
else:
print('Loading vector index...')
storage_context = StorageContext.from_defaults(persist_dir=f"./storage/{name}", vector_store=vector_store)
vector_index = load_index_from_storage(storage_context=storage_context)
query_engine = vector_index.as_query_engine(similarity_top_k=5)
summary_index = SummaryIndex(nodes)
summary_engine = summary_index.as_query_engine(response_mode="tree_summarize")
query_tool = QueryEngineTool.from_defaults(query_engine=query_engine,
name='query_tool',
description=f'Use if you want to query details about {name}')
summary_tool = QueryEngineTool.from_defaults(query_engine=summary_engine,
name='summary_tool',
description=f'Use ONLY IF you want a holistic summary of {name}.')
tool_agent = ReActAgent.from_tools([query_tool, summary_tool], verbose=True,
system_prompt=f"""You are a specialized agent designed to answer queries about {name}. You must ALWAYS use at least one of the provided tools; do NOT rely on prior knowledge or fabricate answers."""")
return tool_agentThis function builds two indexes per document—a vector index for factual queries and a summary index for holistic answers—then wraps them as QueryEngineTool objects and creates a ReAct‑style ToolAgent.
3. Batch‑Create Tool Agents
print('Creating tool agents for different documents...')
tool_agents_dict = {}
for name, file in zip(names, files):
tool_agent = create_tool_agent(file, name)
tool_agents_dict[name] = tool_agent4. Build the Top Agent
# Convert each Tool Agent into a tool
all_tools = []
for name in names:
agent_tool = QueryEngineTool.from_defaults(
query_engine=tool_agents_dict[name],
name=f"tool_{name.replace('-', '')}",
description=f"Use this tool if you want to answer any questions about {name}."
)
all_tools.append(agent_tool)
# Create the Top Agent (using OpenAIAgent for function‑calling)
top_agent = OpenAIAgent.from_tools(tools=all_tools, verbose=True,
system_prompt="""You are an agent designed to answer queries over a set of given papers. Always use the provided tools; do NOT rely on prior knowledge or fabricate answers.""")The Top Agent receives a user query, selects the appropriate Tool Agent(s), and orchestrates their execution.
Testing the System
top_agent.chat_repl()When prompted with a question such as "Please introduce Retrieval Evaluator in C‑RAG pattern?" , the Top Agent calls the relevant Tool Agent, which in turn uses the query_tool to retrieve factual information, then returns the answer.
Scaling to Hundreds of Documents
Creating a Tool Agent for each document can overwhelm the Top Agent with too many tools, increasing LLM confusion and token cost. To mitigate this, the article proposes indexing the tools themselves using LlamaIndex’s Object Index , allowing semantic retrieval of only the most relevant tools per query.
# Build a tool retriever
obj_index = ObjectIndex.from_objects(all_tools, index_cls=VectorStoreIndex)
tool_retriever = obj_index.as_retriever(similarity_top_k=5, verbose=True)
# Pass the retriever to the Top Agent instead of the full tool list
top_agent = OpenAIAgent.from_tools(
tool_retriever=tool_retriever,
verbose=True,
system_prompt="""You are an agent designed to answer queries over a set of given papers. Always use the tools provided; do NOT rely on prior knowledge."""
)Now the Top Agent first retrieves a small set of relevant Tool Agents before planning, preserving performance even with large document collections.
Conclusion
Agentic RAG extends classic RAG by embedding AI agents that can plan, invoke multiple specialized RAG engines, and collaborate across documents. This architecture offers flexibility for complex knowledge‑intensive tasks such as cross‑document comparison, summarization, and tool‑augmented workflows, while remaining scalable through tool‑level retrieval.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AI Large Model Application Practice
Focused on deep research and development of large-model applications. Authors of "RAG Application Development and Optimization Based on Large Models" and "MCP Principles Unveiled and Development Guide". Primarily B2B, with B2C as a supplement.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
