14 min read

Revisiting Core Agent Patterns: ReAct, Plan‑and‑Solve, and Tree of Thoughts

The article analyzes why simple ReAct loops fail on long‑chain business tasks and explains how Plan‑and‑Solve, Tree of Thoughts, and Graph of Thoughts add planning, search, and state‑machine layers to make complex agents reliable, auditable, and cost‑controlled.

AI Step-by-Step

Apr 24, 2026

Revisiting Core Agent Patterns: ReAct, Plan‑and‑Solve, and Tree of Thoughts

1. ReAct’s Boundary in Long‑Chain Tasks

ReAct (reasoning and acting) lets a model reason, call a tool, observe the result, and repeat. This loop works well for short, interactive queries such as looking up a fact, fixing a code snippet, or judging a single order. In long‑chain tasks, however, steps depend on each other, some can run in parallel, some require human approval, and failures need compensation. Making the next step depend only on the latest observation leads to greedy, local decisions and loss of the global dependency view.

Four common failure signals of ReAct in complex tasks

Tasks become scattered; the model keeps adding steps without a clear dependency graph.

Tool‑call order is driven by the most recent observation, missing better paths or parallel opportunities.

After a mid‑process failure the model can only keep probing, without clear guidance on retry, rollback, or escalation.

Process logs remain a conversational transcript, hard to turn into an auditable business state.

The first step for complex tasks is to lift the “single chain of actions” into a “task graph”. This is where Plan‑and‑Solve becomes applicable.

2. Plan‑and‑Solve Starts with a Dependency Graph

Plan‑and‑Solve’s core idea is to plan before solving. The model first decomposes the problem into sub‑tasks, specifies the expected output of each, and then executes according to the plan.

In practice the plan should be a structured task‑dependency graph rather than a plain natural‑language list. Each node contains a task ID, inputs, expected outputs, dependencies, available tools, acceptance criteria, and failure handling. This enables the executor to know which nodes can run in parallel and which must wait for upstream results.

Example: Complaint‑handling agent task graph

Receive complaint and identify target

Query customer tier (CRM / membership system)

Verify contract SLA (contract database / clause extraction)

Gather fault evidence (work order / logs / monitoring)

Generate compensation‑plan candidates (depends on tier, SLA, evidence)

Submit for approval or hand‑off to a human

The graph makes clear which information must be collected first, which tasks can run concurrently, and how outputs jointly constrain the final solution. After the graph is built, the executor works on a set of schedulable, recoverable, and verifiable nodes instead of a vague “handle complaint”.

3. ToT and GoT Turn Planning into a Searchable Candidate Space

Tree of Thoughts (ToT) treats intermediate reasoning results as expandable, scoreable, and backtrackable states. At key nodes the model generates multiple candidate thoughts and a scorer selects the most promising branches for further expansion.

Graph of Thoughts (GoT) relaxes the structure further: candidate ideas can be merged, rewritten, aggregated, and iteratively improved. This matches enterprise tasks where contract clauses, fault evidence, and customer value are repeatedly refined, naturally forming a graph.

When deployed, ToT/GoT perform three engineering actions: generate candidate paths, score them, and prune within a budget. Each candidate must carry cost, risk, evidence completeness, and executability information for the scorer.

4. State Machine as the Execution Anchor

The planning layer produces the task graph; the search layer produces candidate paths. The state machine consumes both and drives execution. Without a state machine, plans remain inside the model’s context and cannot survive process restarts, tool timeouts, or approval pauses.

For complex agents the state machine should record three categories of state:

Node status : planned, pending, running, blocked, awaiting approval, completed, failed.

Evidence status : data retrieved, insufficient evidence, conflicting evidence, need additional retrieval.

Governance status : human confirmation required, authorized, rejected, compensated.

Frameworks such as LangGraph and AutoGen’s GraphFlow expose nodes, edges, checkpoints, and human interruptions as runtime capabilities, confirming that dialogue history alone cannot handle production‑grade scheduling.

5. Scorer Controls Costly Trial‑and‑Error

In production, ToT/GoT face a cost boundary: each candidate branch may trigger model calls, tool calls, database queries, sandbox runs, and human approvals. An unbounded search quickly turns quality improvement into latency and expense.

The scorer must answer more than “which answer looks better”. It should evaluate:

Evidence completeness

Business risk (amounts, permissions, contracts, notifications, irreversible effects)

Cost (model calls, tool calls, latency, human waiting time)

Recoverability (ability to retry, compensate, rollback, or keep a draft)

Completion (whether the solution meets user goals and business acceptance criteria)

The scorer’s output feeds the state machine, indicating why a path is kept, pruned, or escalated, which is essential for audit, tuning, and post‑mortem.

6. Four‑Layer Execution Architecture for Complex Agents

Combining the previous layers yields a four‑layer architecture:

Planning layer : Plan‑and‑Solve generates a task‑dependency graph (nodes, inputs, outputs, dependencies, acceptance criteria).

Search layer : ToT/GoT creates, scores, and prunes candidate paths.

State‑machine layer : Manages node lifecycle, evidence status, and checkpoints for resumability.

Governance layer : Handles approvals, budgets, permissions, compensation, audit logs, and human hand‑off.

Both planning and search produce structured representations that the state machine translates into concrete tool invocations, pauses, or human interventions.

7. Practical Roll‑out Order

Teams should not implement the full GoT stack immediately. A safer path is:

Pick a real business chain with five or more steps that often fails and needs human confirmation (e.g., refund approval, fault compensation, contract review, data‑analysis report).

Convert existing ReAct logs into node states, making inputs, outputs, dependencies, and failure handling explicit.

Add a dependency‑graph generation step in the planning layer to identify parallelizable nodes and required sequencing.

Introduce ToT/GoT only at high‑value decision points, using a scorer to limit candidate count, cost, and risk.

This incremental approach avoids premature complexity; many tasks only need Plan‑and‑Solve, while a few critical branches benefit from ToT‑style candidate comparison.

Ultimately, the ceiling of complex agents depends not only on the model’s reasoning ability but also on the system’s capacity to organize reasoning into planning, search, and recoverable execution states.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

React State Machine Agent AI Planning Tree of Thoughts Plan-and-Solve Graph of Thoughts

Written by

AI Step-by-Step

Sharing AI knowledge, practical implementation records, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.