Building Agentic RAG with LlamaIndex: From Tool Agents to a Top Agent

This article walks through the design and implementation of an Agentic Retrieval‑Augmented Generation system using LlamaIndex, showing how to wrap multiple RAG engines as tools, orchestrate them with hierarchical AI agents, and scale the solution with tool retrieval for large document collections.

AI Large Model Application Practice
AI Large Model Application Practice
AI Large Model Application Practice
Building Agentic RAG with LlamaIndex: From Tool Agents to a Top Agent

Background and Motivation

Classic Retrieval‑Augmented Generation (RAG) pipelines retrieve relevant chunks from vector stores and feed them to a large language model (LLM) for answer synthesis. While effective for simple fact‑finding, they struggle with enterprise scenarios that require global document understanding, cross‑document comparison, or integration with external tools.

Agentic RAG Concept

Agentic RAG introduces AI agents that plan and coordinate multiple specialized RAG engines (tools). A Tool Agent encapsulates a set of RAG engines for a single document or knowledge base, while a Top Agent manages all Tool Agents, selecting and invoking the appropriate tools to answer complex queries.

Implementation with LlamaIndex

The following steps demonstrate a concrete implementation using LlamaIndex (compatible with LangChain).

1. Prepare Test Documents

names = ['c-rag','self-rag','kg-rag']
files = ['../../data/c-rag.pdf','../../data/self-rag.pdf','../../data/kg-rag.pdf']

2. Create a Tool Agent for a Single Document

def create_tool_agent(file, name):
    print(f'Starting to create tool agent for 【{name}】...')
    docs = SimpleDirectoryReader(input_files=[file]).load_data()
    splitter = SentenceSplitter(chunk_size=500, chunk_overlap=50)
    nodes = splitter.get_nodes_from_documents(docs)

    # Vector index (persistent)
    if not os.path.exists(f"./storage/{name}"):
        print('Creating vector index...')
        storage_context = StorageContext.from_defaults(vector_store=vector_store)
        vector_index = VectorStoreIndex(nodes, storage_context=storage_context)
        vector_index.storage_context.persist(persist_dir=f"./storage/{name}")
    else:
        print('Loading vector index...')
        storage_context = StorageContext.from_defaults(persist_dir=f"./storage/{name}", vector_store=vector_store)
        vector_index = load_index_from_storage(storage_context=storage_context)

    query_engine = vector_index.as_query_engine(similarity_top_k=5)
    summary_index = SummaryIndex(nodes)
    summary_engine = summary_index.as_query_engine(response_mode="tree_summarize")

    query_tool = QueryEngineTool.from_defaults(query_engine=query_engine,
        name='query_tool',
        description=f'Use if you want to query details about {name}')
    summary_tool = QueryEngineTool.from_defaults(query_engine=summary_engine,
        name='summary_tool',
        description=f'Use ONLY IF you want a holistic summary of {name}.')

    tool_agent = ReActAgent.from_tools([query_tool, summary_tool], verbose=True,
        system_prompt=f"""You are a specialized agent designed to answer queries about {name}. You must ALWAYS use at least one of the provided tools; do NOT rely on prior knowledge or fabricate answers."""")
    return tool_agent

This function builds two indexes per document—a vector index for factual queries and a summary index for holistic answers—then wraps them as QueryEngineTool objects and creates a ReAct‑style ToolAgent.

3. Batch‑Create Tool Agents

print('Creating tool agents for different documents...')
tool_agents_dict = {}
for name, file in zip(names, files):
    tool_agent = create_tool_agent(file, name)
    tool_agents_dict[name] = tool_agent

4. Build the Top Agent

# Convert each Tool Agent into a tool
all_tools = []
for name in names:
    agent_tool = QueryEngineTool.from_defaults(
        query_engine=tool_agents_dict[name],
        name=f"tool_{name.replace('-', '')}",
        description=f"Use this tool if you want to answer any questions about {name}."
    )
    all_tools.append(agent_tool)

# Create the Top Agent (using OpenAIAgent for function‑calling)
top_agent = OpenAIAgent.from_tools(tools=all_tools, verbose=True,
    system_prompt="""You are an agent designed to answer queries over a set of given papers. Always use the provided tools; do NOT rely on prior knowledge or fabricate answers.""")

The Top Agent receives a user query, selects the appropriate Tool Agent(s), and orchestrates their execution.

Testing the System

top_agent.chat_repl()

When prompted with a question such as "Please introduce Retrieval Evaluator in C‑RAG pattern?" , the Top Agent calls the relevant Tool Agent, which in turn uses the query_tool to retrieve factual information, then returns the answer.

Scaling to Hundreds of Documents

Creating a Tool Agent for each document can overwhelm the Top Agent with too many tools, increasing LLM confusion and token cost. To mitigate this, the article proposes indexing the tools themselves using LlamaIndex’s Object Index , allowing semantic retrieval of only the most relevant tools per query.

# Build a tool retriever
obj_index = ObjectIndex.from_objects(all_tools, index_cls=VectorStoreIndex)
tool_retriever = obj_index.as_retriever(similarity_top_k=5, verbose=True)

# Pass the retriever to the Top Agent instead of the full tool list
top_agent = OpenAIAgent.from_tools(
    tool_retriever=tool_retriever,
    verbose=True,
    system_prompt="""You are an agent designed to answer queries over a set of given papers. Always use the tools provided; do NOT rely on prior knowledge."""
)

Now the Top Agent first retrieves a small set of relevant Tool Agents before planning, preserving performance even with large document collections.

Conclusion

Agentic RAG extends classic RAG by embedding AI agents that can plan, invoke multiple specialized RAG engines, and collaborate across documents. This architecture offers flexibility for complex knowledge‑intensive tasks such as cross‑document comparison, summarization, and tool‑augmented workflows, while remaining scalable through tool‑level retrieval.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonRAGAI AgentretrievalLlamaIndexTool Agent
AI Large Model Application Practice
Written by

AI Large Model Application Practice

Focused on deep research and development of large-model applications. Authors of "RAG Application Development and Optimization Based on Large Models" and "MCP Principles Unveiled and Development Guide". Primarily B2B, with B2C as a supplement.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.