Artificial Intelligence 18 min read

From RAG to Deep Research Agent: Building a Multi‑Round AI Agent with ReAct

This article walks through the practical differences between simple Retrieval‑Augmented Generation and a full Deep Research Agent, explains the four pillars that support such agents, demonstrates a minimal ReAct implementation with robust error handling, and shares interview tips for showcasing these systems.

Wu Shixiong's Large Model Academy

Apr 8, 2026

From RAG to Deep Research Agent: Building a Multi‑Round AI Agent with ReAct

In the first post of the Deep Research Agent series, the author recounts a recent interview where a candidate was asked to differentiate a standard RAG system from a multi‑round research agent, highlighting the need to stress‑test agents beyond merely getting them to run.

1. RAG vs. Deep Research Agent

RAG is likened to a dictionary lookup—adequate for single‑turn, factoid questions. In contrast, a Deep Research Agent must handle research‑type queries that require many dependent retrieval steps, such as comparing the solid‑state battery strategies of the top five EV manufacturers. The article summarizes four problem categories (single‑turn QA, two‑step reasoning, research‑type, complex analysis) and notes that only the latter truly demand an agent capable of dynamic, multi‑round searches.

2. The Four Pillars of a Deep Research Agent

Training data : Curated examples of end‑to‑end research workflows that the model can learn from.

Agent framework : The reasoning‑and‑acting (ReAct) loop that defines how the model formats thoughts, decides actions, and processes observations.

Training system : Supervised fine‑tuning (SFT) followed by reinforcement learning (RL) to teach the model both format compliance and strategic decision‑making.

Toolset : Search, web‑page extraction (Visit), scholarly retrieval (Scholar), and code execution (Python) that connect the model to external knowledge sources.

The article focuses on implementing the second pillar—the basic ReAct framework.

3. ReAct: Interleaving Thought, Action, and Observation

The ReAct loop consists of three steps repeated until the model decides it has enough information:

Thought : The model assesses current knowledge and determines the next information gap.

Action : It calls a tool (e.g., a search API) to fill that gap.

Observation : The tool’s result is appended to the conversation history, and the cycle restarts.

This process can iterate dozens of times, enabling deep, multi‑turn research.

4. Minimal ReAct Implementation

import json

def react_loop(
    user_query: str,
    llm,
    tools: dict,
    max_steps: int = 10
) -> str:
    """Minimal ReAct execution loop supporting any function‑calling LLM."""
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_query}
    ]
    for step in range(max_steps):
        response = llm.chat(
            messages=messages,
            tools=list(tools.values()),
            tool_choice="auto"
        )
        if response.tool_calls:
            tool_call = response.tool_calls[0]
            tool_name = tool_call.function.name
            tool_args = json.loads(tool_call.function.arguments)
            print(f"[Step {step + 1}] {tool_name}({tool_args})")
            try:
                observation = tools[tool_name](**tool_args)
            except Exception as e:
                observation = f"Tool call failed ({type(e).__name__}): {e}. Try an alternative."
            messages.append(response.to_message())
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": str(observation)
            })
        else:
            print(f"[Done] Executed {step + 1} steps")
            return response.content
    last_content = next(
        (m["content"] for m in reversed(messages) if m["role"] == "assistant"),
        "Reached max steps without a complete answer."
    )
    return f"[Reached {max_steps} steps] {last_content}"

The code works with OpenAI, DeepSeek, Qwen, or any LLM that supports function calling. Adding try/except around tool calls prevents the entire agent from crashing when a tool times out or returns an error, raising success rates from 78 % to 91 % in internal tests.

5. Scaling Issues: Token Limits and Strategy Stagnation

When the agent runs many rounds, the message list grows linearly. With an average of 2,500 tokens per step, ten steps already approach 25 k tokens, exceeding the comfortable context window of many models. Beyond this threshold, attention to early information fades, leading to mistakes such as mixing EU and US policy data.

Another hidden problem is strategy stagnation: the model may repeatedly issue the same search query because it cannot see that the needed information was already retrieved earlier. The article proposes a fix called IterResearch , which maintains a fixed‑size structured summary of known facts and remaining sub‑questions instead of appending raw observations.

6. System Prompt Design

The system prompt (shown below) defines the agent’s role, output format, and behavioral boundaries. Explicitly instructing the model to flag contradictory sources and to admit when information is unavailable dramatically reduces factual mixing—from 13 % down to 6 % in the author’s test set.

SYSTEM_PROMPT = """You are a professional research‑oriented AI assistant capable of multi‑round tool calls.

Work flow:
- Analyze the question, identify missing information, and retrieve it step by step.
- Before each tool call, state the purpose (Thought).
- After receiving results, evaluate quality and decide next steps.
- If the same query yields no new info, change angle; avoid redundant searches.

Output format:
- Use a short Thought before each tool call.
- Summarize sources in the final answer.

Boundaries:
- Explicitly point out contradictory information; do not synthesize it.
- If the answer cannot be found, state that honestly.
"""

7. Interview Guidance

The author provides a concise script for answering interview questions about Deep Research Agents: first contrast RAG and agents with a concrete failure example, then explain the ReAct loop and token‑overflow symptoms, describe the error‑handling strategy that boosted success rates, and finally mention the IterResearch approach for mitigating context growth.

8. Conclusion and Next Steps

The minimal ReAct framework is the starting point; upcoming posts will add duplicate‑detection, richer error handling, and token monitoring to produce a production‑ready agent. Readers are encouraged to run the provided code, observe its behavior, and iteratively improve the system prompt based on real failures.

LLM prompt engineering Tool Integration RAG

Written by

Wu Shixiong's Large Model Academy

We continuously share large‑model know‑how, helping you master core skills—LLM, RAG, fine‑tuning, deployment—from zero to job offer, tailored for career‑switchers, autumn recruiters, and those seeking stable large‑model positions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.