Artificial Intelligence 37 min read

LLM Agent Design Patterns: From ReAct to Multi‑Agent Collaboration

This article systematically reviews major LLM agent design patterns—including ReAct, CodeAct, static and dynamic planning, reflection, and human‑in‑the‑loop—detailing their core loops, code structures, trade‑offs, and practical use‑cases, and provides a decision tree to help developers choose the most suitable pattern for their tasks.

Sohu Tech Products

Jun 24, 2026

LLM Agent Design Patterns: From ReAct to Multi‑Agent Collaboration

ReAct Pattern – Reasoning + Acting Loop

The ReAct pattern defines a closed‑loop workflow:

Thought → Action → Action Input → PAUSE → Observation → (repeat) → Answer

. The model first reasons about the task (Thought), selects a predefined tool (Action), provides the tool input, pauses for the tool’s result, observes the output, and repeats until a final answer is produced.

Thought : LLM analyses the current state and decides the next step.

Action : Calls an external tool (e.g., a search or calculation API).

Observation : Feeds the tool’s result back to the model.

Loop : Repeats until the model outputs a final answer.

Prompt example (REACT_PROMPT) :

REACT_PROMPT = """
Thought → Action → Action Input → PAUSE → Observation → (repeat) → Answer
Use Thought to describe your thoughts about the question.
Use Action to run one of the actions available to you.
Use Action Input to indicate the input to the Action – then return PAUSE.
Observation will be the result of running those actions.
Your available actions are: {tools}
Rules:
1. If the input is a greeting or goodbye, respond directly without the loop.
2. Otherwise, follow the Thought‑Action loop to find the best answer.
3. If you already have the answer, use your knowledge without external actions.
4. If you need to execute more than one Action, do it on separate calls.
5. At the end, provide a final answer.
"""

Illustrative interaction (stock price comparison) :

User: "比较青岛啤酒和贵州茅台的股票收盘价谁高？"
LLM (Thought): "我需要先获取青岛啤酒的股价。"
Action: get_closing_price
Action Input: {"name": "青岛啤酒"}
PAUSE
Observation: 67.92
LLM (Thought): "现在获取贵州茅台的股价。"
Action: get_closing_price
Action Input: {"name": "贵州茅台"}
PAUSE
Observation: 1488.21
LLM (Final Answer): "贵州茅台的股价更高。"

CodeAct Pattern – Dynamic Code Generation

CodeAct extends ReAct by letting the LLM generate executable Python code on the fly instead of calling predefined functions. The generated code runs in a sandbox (Docker/E2B) and its output is fed back as an Observation.

Prompt example (SYSTEM_PROMPT) :

SYSTEM_PROMPT = """
You are an intelligent assistant that can write and execute code.
When the user asks a question:
1. Analyse the problem and decide what code to write.
2. Write Python code that solves it.
3. Execute the code with the execute_python tool.
4. Analyse the result; if there is an error, fix the code and retry.
5. Finally provide the answer to the user.
"""

Key differences (textual list) :

Tool form: predefined functions (ReAct) vs dynamic code (CodeAct).

Flexibility: limited by preset tools (ReAct) vs arbitrary computation (CodeAct).

Execution environment: local/remote API (ReAct) vs sandboxed Docker/E2B (CodeAct).

Applicable tasks: standard tool calls (ReAct) vs complex algorithms and data analysis (CodeAct).

Plan Mode – Structured Planning

Plan mode separates planning from execution. Two variants are described.

Simple (static) plan : The LLM generates the entire plan as a string once, stores it in state["plan"], and then executes steps autonomously.

Advanced (dynamic) plan : The plan is a mutable list; after each step the LLM updates the remaining steps, allowing fine‑grained control and early termination.

Simple plan workflow (pseudo‑graph):

START → plan_node → execute_node → tool_node → ... → END

Advanced plan workflow (pseudo‑graph):

START → execute → planstep → execute → planstep → ... → END

Simple plan example (stock price comparison):

# First iteration – fetch 青岛啤酒 price
Plan: 1. 获取青岛啤酒的股票收盘价
      2. 获取贵州茅台的股票收盘价
      3. 比较两者并给出结论
Execute step 1 → Observation: 67.92
Update plan → ["获取贵州茅台的股票收盘价", "比较并给出结论"]
# Second iteration – fetch 贵州茅台 price
Execute step 1 → Observation: 1488.21
Update plan → ["比较并给出结论"]
# Third iteration – comparison (no tool needed)
LLM directly returns final answer.

Advanced plan implementation details : PlanState inherits from MessagesState and stores plan (a string) for the simple variant. PlanExecute (TypedDict) holds plan (list of steps), past_steps (record of completed steps), and response (final answer) for the advanced variant. plan_node calls the LLM with PLAN_PROMPT to generate a full plan. execute_node sends the current plan to the LLM, which may emit tool calls. tool_node executes the tool calls and appends ToolMessage observations. plan_step (advanced) invokes a structured‑output model; if the output is a Response, the workflow ends, otherwise the updated plan list is stored. should_end checks whether state["response"] exists to decide termination.

Advanced plan example (same stock task) :

# First loop – fetch 青岛啤酒 price
Plan: ["获取青岛啤酒的股票收盘价", "获取贵州茅台的股票收盘价", "比较并给出结论"]
Execute step 1 → Observation: 67.92
past_steps ← [("获取青岛啤酒的股票收盘价", "67.92")]
Update plan → ["获取贵州茅台的股票收盘价", "比较并给出结论"]
# Second loop – fetch 贵州茅台 price
Execute step 1 → Observation: 1488.21
past_steps ← [..., ("获取贵州茅台的股票收盘价", "1488.21")]
Update plan → ["比较并给出结论"]
# Third loop – comparison step does not need a tool
Model directly returns Response:
{"response": "贵州茅台的股价更贵。"}
Workflow ends.

Reflection Mode – Generate‑Check‑Optimize Loop

Reflection introduces a second “review” agent that checks the primary agent’s output and asks it to improve the answer. The loop runs up to three times or until the reviewer signals that the result is optimal.

generate_command → reflection_check → (if not optimal) generate_command → ... → final output

Key nodes : generate_command: Uses COMMAND_PROMPT on the first pass, then REFLECTION_PROMPT on subsequent passes. reflect_and_optimize: Sends the generated command back to the LLM for safety, efficiency, and POSIX compliance checks. check_reflection: Ends the workflow if the reflection contains “无建议”, “无需优化”, or any stop‑word (e.g., “安全隐患”, “木马”, “攻击”), or after three iterations.

Decision logic (Python‑style) :

if "无建议" in reflection or "无需优化" in reflection:
    END
elif any(stop in reflection for stop in ["安全隐患", "木马", "攻击"]):
    END
elif iterations >= 3:
    END
else:
    generate_command

Human‑in‑the‑Loop (HITL) Mode

When the LLM needs missing information, it can invoke a special ask_user tool. The workflow pauses, the system prompts the real user for input, then resumes from the same checkpoint.

class HumanState(MessagesState):
    query: str

async def llm_node(state: HumanState):
    messages = [SystemMessage(content="你是一个仓库管理员..."),
                HumanMessage(content=state["query"])] + state["messages"]
    response = await llm_with_tools.ainvoke(messages)
    state["messages"].append(response)
    return state

async def human_node(state: HumanState):
    tool_call_id = state["messages"][-1].tool_calls[0]["id"]
    content = interrupt(state["messages"][-1].tool_calls[0]["args"])  # wait for real user input
    tool_message = ToolMessage(content=content, tool_call_id=tool_call_id)
    state["messages"].append(tool_message)
    return state

The graph uses a conditional edge enter_tools that routes to human_node when the tool name is ask_user, otherwise to tool_node. A MemorySaver checkpoint stores the state so that after the user provides input, the workflow resumes exactly where it left off.

Comparative Overview (textual list)

ReAct : Thought‑Action‑Observation loop; high interpretability; multiple LLM calls increase latency.

CodeAct : Generates and runs arbitrary Python code; extreme flexibility; requires sandbox for security.

Plan (simple) : Generates a full static plan; clear execution path; plan cannot adapt to new information.

Plan (advanced) : Mutable plan list updated after each step; fine‑grained control; higher implementation complexity.

Reflection : Generate‑Check‑Optimize loop; higher output quality; at least double LLM calls, higher cost.

Human‑in‑the‑Loop : Pauses for real‑user input; resolves ambiguous requirements; interrupts automation flow.

Decision Tree for Selecting a Pattern

Is human input required?
├── Yes → Human‑in‑the‑Loop
└── No → Does the task need ultra‑high reliability?
    ├── Yes → Reflection
    └── No → Are tools predefined?
        ├── Yes → Are steps fixed?
        │   ├── Yes → Plan (simple)
        │   └── No  → Plan (advanced)
        └── No → Does the task involve complex computation?
            ├── Yes → CodeAct
            └── No  → ReAct

Combination Recommendations (textual list)

Start with ReAct for straightforward tool calls; it offers clear reasoning traces.

Switch to CodeAct when the problem requires custom calculations or data processing beyond predefined APIs.

Use simple planning for static pipelines; adopt advanced planning when steps may change based on intermediate results.

Apply Reflection for security‑sensitive or high‑stakes outputs; limit iterations to control cost.

Integrate Human‑in‑the‑Loop whenever user clarification is essential, using checkpointing to resume seamlessly.

Combine patterns (e.g., Plan + Reflection or CodeAct + HITL) to balance flexibility, safety, and efficiency.

Key Takeaways

Understanding the design philosophy and trade‑offs of each pattern enables developers to build more robust, reliable, and maintainable LLM‑driven agents. Choose the pattern based on task complexity, reliability requirements, and cost constraints.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM ReAct Reflection Agent Planning human-in-the-loop CodeAct

Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

ReAct Pattern – Reasoning + Acting Loop

CodeAct Pattern – Dynamic Code Generation

Plan Mode – Structured Planning

Reflection Mode – Generate‑Check‑Optimize Loop

Human‑in‑the‑Loop (HITL) Mode

Comparative Overview (textual list)

Decision Tree for Selecting a Pattern

Combination Recommendations (textual list)

Key Takeaways

Sohu Tech Products

How this landed with the community

Was this worth your time?

0 Comments

ReAct Pattern – Reasoning + Acting Loop