Why Agent Loops Matter More Than Raw Model Power
The article explains how AI agents that operate in a reasoning‑action‑observation loop outperform single‑shot LLM inference by continuously observing, planning, and correcting errors, illustrated through a ticket‑booking example and detailed analyses of ReAct, Plan‑Execute, OODA, and Steering Loop architectures.
Limits of Single‑Shot Inference
LLMs generate output via a one‑directional auto‑regressive pipeline (input → encode → decode → output) without a feedback mechanism. This creates three fundamental defects:
No real feedback loop – the model can only hypothesize about an action (e.g., “search flights”) but cannot actually perform it or distinguish between expectation and observation.
Context window ceiling – each turn adds more history until the token limit is reached, causing the model to forget earlier information.
Inability to self‑correct errors – once an incorrect conclusion is emitted the model either continues hallucinating or must be interrupted; it has no internal trigger to retry.
These defects make single‑step reasoning unsuitable for real‑world, multi‑step, trial‑and‑error tasks, which require a loop that can observe, act, and re‑evaluate.
Agent Loops: Continuous Reasoning
An Agent wraps an LLM with an external loop that repeatedly observe → think → act → observe . The loop supplies real feedback, records intermediate state, and stops only when a termination condition is met.
Four Core Loop Patterns
3.1 ReAct – Reasoning + Acting
ReAct interleaves Thought, Action, and Observation. The original formulation (Yao et al., 2022) is:
Thought → Action → Observation → Thought → Action → Observation → …Worked example (ticket‑booking Agent):
Thought: "User wants to go to Shanghai; check direct flights." Action: Call flight‑search API. Observation: "No direct flights." Thought: "Try high‑speed rail or connecting flight." Action: Query rail and connecting flights in parallel. Observation: "Rail 7 h, 800 ¥; connecting flight 5 h, 1000 ¥." Thought: "Rail is slower but cheaper and avoids transfers. Recommend rail." Action: Output final recommendation.
ReAct allows dynamic branching (e.g., parallel exploration of rail and connecting flight) and makes the next step a product of reasoning rather than hard‑coded logic.
3.2 Plan‑Execute – Map Before You Walk
Plan‑Execute mitigates ReAct’s myopia by creating a global plan first, then executing steps while observing results and replanning if needed:
Plan → Execute Step 1 → Observe → Execute Step 2 → Observe → … → Re‑plan if neededIn a competitive‑analysis task, a naive ReAct approach would search product A, discover five product lines, then search each line, later discover overlap with product B, and finally learn that B has acquired two of A’s lines – wasted effort. Plan‑Execute first enumerates the questions to answer (e.g., "who are the competitors? what are their core products? how do market positions differ?") and allocates steps accordingly, with a Re‑plan stage for unexpected findings.
3.3 OODA – Observe, Orient, Decide, Act
Originating from John Boyd’s air‑combat research, OODA adds an explicit Orient stage that interprets observations against existing hypotheses. The cycle is:
Observe – gather raw data. Orient – place data in a cognitive frame, test assumptions. Decide – select a course of action. Act – execute. Loop back to Observe.
Orient provides deeper analysis than ReAct’s Thought, and the loop has no predefined endpoint – it continues until the opponent’s loop is outpaced.
3.4 Steering Loop – Guides and Sensors
Steering Loop treats the Agent as model + scaffolding . Guides (pre‑action constraints) are applied before reasoning (e.g., "do not modify package.json", "prefer existing utility functions"). After execution, Sensors (post‑action checks) validate results (correctness, safety, compliance). The loop is:
Guides → LLM reasoning → Execute → Sensors → Result OK? → If not, back to GuidesThis pattern mirrors classic control systems used in aerospace, industrial automation, and autonomous driving.
Designing a Robust Agent Loop
4.1 When to Stop
Maximum iteration count (e.g., 5‑10 rounds) to avoid infinite loops.
Confidence threshold on the latest Observation (e.g., stop when confidence > X %).
Duplicate‑action detection: if the same Action yields identical Observation for three consecutive rounds, halt and change direction.
Good loops are not afraid to stop; they stop when further iteration adds no value.
4.2 Managing Context Size
Each Thought‑Action‑Observation triple averages ~200 tokens. Ten rounds therefore consume ~2000 tokens, and with system prompts and tool definitions the total can exceed typical LLM windows, leading to “mid‑session forgetting”. Mitigation strategies:
Summary compression : after N rounds, compress prior Observations into a concise state (e.g., "Completed: flight search, rail search; Known: no direct flight, rail 7 h/800 ¥, connecting flight 5 h/1000 ¥").
Sliding window + persistent memory : keep only the most recent K rounds in the prompt; store older entries in an external vector DB or structured store for retrieval.
Stage‑specific context : use full context for the planning stage; switch to short context for each execution step.
4.3 Handling Errors Gracefully
Prefer reversible actions : querying APIs is safe; irreversible actions (e.g., sending email, deleting records) require double confirmation or sandbox testing.
Checkpoint + rollback : after major milestones (e.g., "all candidate options identified"), snapshot the Agent’s state; if later observations contradict expectations, roll back instead of restarting from scratch.
Multi‑path verification : compute critical results via two independent methods; mismatched outcomes trigger a re‑check.
LLM hallucination rates are reported between 5 % and 15 %. In a 10‑round loop the cumulative error probability approaches 100 %, so these safeguards are essential.
Key Insight
Intelligence is less about deeper single‑shot reasoning and more about the resilience of the loop. By embedding powerful LLMs in well‑designed Agent Loops—ReAct, Plan‑Execute, OODA, or Steering—developers transform a static answer generator into a proactive, error‑aware partner capable of sustained task execution.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ZhiKe AI
We dissect AI-era technologies, tools, and trends with a hardcore perspective. Focused on large models, agents, MCP, function calling, and hands‑on AI development. No fluff, no hype—only actionable insights, source code, and practical ideas. Get a daily dose of intelligence to simplify tech and make efficiency tangible.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
