Artificial Intelligence 21 min read

ReAct, Plan‑Execute, and Reflection: How Continuous Loops Make Agent Architecture Crucial

While a single LLM call is a stateless function, real‑world tasks require dynamic information gathering, hypothesis testing, and iterative refinement, so agents must operate in a continuous loop; the article analyzes core patterns such as ReAct, Plan‑Execute, Reflection, Multi‑Agent and HITL, highlighting state management, cost, debugging, and observability challenges.

AI Engineer Programming

May 17, 2026

ReAct, Plan‑Execute, and Reflection: How Continuous Loops Make Agent Architecture Crucial

Agent Loop

Calling an LLM is a stateless function: it receives an input, produces an output, and terminates. The limitation of a single call is not model capability but isolation from the external world—once the prompt is fixed, the output cannot be altered by intermediate state.

Many real‑world tasks require dynamic information acquisition, hypothesis verification, and strategy adjustment. These tasks exhibit three structural challenges:

Incomplete information : not all required data can be packed into the initial prompt; the system must query during execution.

Step dependencies : the result of one step determines the direction of the next, so the full execution path is unknown beforehand.

Vague goals : user intent often needs clarification through feedback loops.

The minimal Agent Loop therefore consists of:

LLM receives the current state.

LLM outputs the next action.

The action is executed, producing a new state.

The new state becomes the input for the next LLM call.

The loop repeats until a termination condition is satisfied. Cost shifts from a per‑call basis to the cumulative cost of the entire loop because each iteration adds the full history to the context, increasing token length. Debugging requires reconstructing the whole execution trace: every LLM input, decision, tool result, and state transition.

ReAct (Reasoning + Acting)

ReAct interleaves reasoning and tool use in a single sequence to avoid hallucinations from static knowledge and to keep tool selection grounded in explicit thought. It cycles through three phases:

Thought : the model writes its current understanding and the logic for the next decision.

Action : the model selects a tool and supplies parameters.

Observation : the tool’s result is appended to the context for the next Thought.

Thought both records and plans, making the process auditable. Production issues:

Infinite‑loop risk when the model cannot reliably judge task completion. Mitigation: hard iteration limits (e.g., max 10–20 steps) and time‑ or cost‑based circuit breakers.

Per‑step LLM cost. When the number of steps is predictable, the waste motivates a Plan‑Execute split.

Plan and Execute

Plan‑Execute separates cognition into a global planning phase and an execution phase.

Planner decomposes the whole task into an ordered list of sub‑tasks, each with explicit input dependencies and expected output formats.

Executor processes each sub‑task. Depending on the sub‑task, the executor may run a ReAct loop, make a single tool call, or execute deterministic code.

The planning phase requires a strong model for complex reasoning; execution can use weaker, cheaper models or pure code. The main production challenge is plan fragility: assumptions made during planning may be falsified at runtime (unexpected tool output formats, API failures, newly discovered constraints). Mitigation strategies:

Encode conditional branches explicitly in the plan.

Trigger re‑planning only on critical sub‑task failure to avoid semantic drift.

Reflection

Transformer decoders generate tokens autoregressively and cannot revisit earlier output. Reflection adds a separate reasoning pass after generation:

Generate: the model produces an answer.

Reflect: a reviewer model (same model with a different system prompt or a separate instance) evaluates the answer.

Refine: the original model revises the answer based on the reviewer’s feedback.

Each iteration involves two full‑context LLM calls, so token usage grows. Production problems include undefined stopping criteria. Common safeguards:

Stop when the reviewer signals “pass”.

Enforce a maximum iteration count (e.g., 5–10).

Apply a per‑iteration token budget limit to prevent runaway costs.

Multi‑Agent Collaboration

A single agent faces two structural constraints in complex tasks:

Context‑window limit : accumulated history eventually exceeds the model’s window, causing loss of early information.

Specialization limit : an agent handling diverse domains (legal, financial, code) performs worse than domain‑specific agents.

Multi‑Agent architectures assign specialized agents to narrow contexts and use an Orchestrator for global coordination. Three common topologies:

Supervisor : the Orchestrator routes tasks to appropriate agents, aggregates results, and decides further delegation. Routing quality is critical because misrouting is hard to debug.

Scatter‑Gather : the same goal is processed in parallel by multiple agents; an Aggregator merges results, requiring conflict‑resolution logic.

Pipeline : output of Agent A becomes input of Agent B, forming a chain; strict interface contracts are required to avoid silent format mismatches.

Production concerns include uncontrolled token consumption: each agent carries its own system prompt and context, and the Orchestrator’s routing decisions also invoke the LLM. Recommended monitoring:

Track per‑agent token usage.

Alert at 80 % of the allocated budget.

Hard‑stop at 100 %.

Human‑in‑the‑Loop (HITL)

HITL is a proactive design where the system pauses at designated control‑flow nodes, awaits human input, and resumes with the new input as the next state. Implementation requires:

State serialization before the pause.

State restoration after the human response.

Without persistence, a restart would lose the entire execution context. Checkpointing also enables time‑travel debugging: revert to any prior checkpoint, inject corrected state, and re‑execute downstream steps without rerunning the whole task.

Main Frameworks

LangGraph models agent workflows as directed graphs; nodes are execution units, edges are state transitions. Supports static and dynamic routing, fan‑out, and fan‑in. Version 1.1 adds reliability middleware (exponential back‑off retries, content‑moderation) and automatic checkpointing for pause‑resume and time‑travel debugging.

CrewAI adopts role‑based multi‑agent orchestration. Version 1.12 introduces reusable Agent Skills, a visual editor, and Qdrant Edge memory for hierarchical isolation, lowering the entry barrier compared with LangGraph.

OpenAI Agents SDK defines five primitives: Agent, Handoff, Guardrails, Sessions, and Tracing. Version 0.13 adds an any‑LLM adapter, breaking the OpenAI‑only limitation, and supports MCP resources and session persistence. Handoff becomes a bottleneck when the number of agents exceeds eight to ten.

Google ADK uses a hierarchical agent tree with a native Agent‑to‑Agent protocol, enabling cross‑framework interoperability. It provides built‑in multimodal support (image, audio, video) via Gemini APIs and deep integration with Vertex AI and Google Cloud services.

Pitfalls Guide

Each loop iteration linearly accumulates cost; Reflection can cause exponential growth. Mitigation layers:

Token‑budget tracking: each agent instance holds a cost counter; abort before a call if the budget would be exceeded.

Circuit breakers: abort the entire task when a single‑task cost threshold is crossed.

Semantic caching: cache results of identical tool calls or LLM requests to avoid duplicate charges.

Hard limits are essential. Do not rely on the model’s self‑judged stop signal. Typical caps (configured at the framework level): execution timeout 10–30 seconds, maximum API calls per task 5–10, maximum token consumption per task (e.g., 100 K tokens). When any cap is hit, terminate, record the partial state, and return results with an interruption flag.

Observability is a prerequisite for debugging. For each LLM call record input tokens, output tokens, latency, model version, and request ID. For each tool call record tool name, input parameters, result or error, and execution time. For each routing decision record source node, destination node, and trigger condition. Use a unified trace_id across all records to reconstruct the full execution trajectory.

Conclusion

Model strength continues to grow, absorbing many capabilities that previously required separate components. However, production‑grade Agent systems succeed or fail based on architectural patterns as much as on model quality. Stronger models improve single‑call quality but cannot solve control‑flow design, state management, termination‑condition gaps, or cost‑overrun issues. Core ideas—ReAct’s Thought‑Action‑Observation cycle, Plan‑Execute’s planning‑execution split, and Reflection’s iterative self‑critique—remain stable, while frameworks evolve rapidly to provide observability, cost control, fault recovery, and regression testing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM ReAct Observability Reflection multi‑agent agent architecture Plan-Execute

Written by

AI Engineer Programming

In the AI era, defining problems is often more important than solving them; here we explore AI's contradictions, boundaries, and possibilities.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.