Artificial Intelligence 25 min read

From ReAct to Loop Engineering: What Exactly Do AI Agents Loop?

The article analyses Loop Engineering as the missing engineering layer for AI agents, defining a minimal Think‑Act‑Observe‑Verify‑Repeat cycle, outlining five loop categories, the six hard boundaries for production use, and practical guidance for turning feedback into verifiable, stoppable, and hand‑off‑ready loops.

Architect

Jun 20, 2026

From ReAct to Loop Engineering: What Exactly Do AI Agents Loop?

Background

Recent weeks have seen a surge of discussion around Loop Engineering , a term that actually points to a long‑standing problem: after an agent completes a step, how does the system decide the next step? Various perspectives—coding‑agent daily use, Harness and worktree, execution graphs, self‑improvement, and long‑term memory—share the same core question: Can feedback reliably enter the next action?

In the agent engineering pipeline, we already have

Prompt → Context → Harness → Goal → Self‑Harness → Environment

. Loop sits between Harness and Environment, determining how feedback is fed into the next iteration.

Minimal Loop

The smallest practical agent loop consists of five stages:

Think: decide the next step based on goal and context
Act: invoke a tool or perform an action
Observe: read the result of the action
Verify: check whether the result meets the goal
Repeat: continue, stop, or hand over to a human

This is essentially the classic ReAct (Reasoning + Acting) pattern.

Why Loop Engineering is Hot Now

Agents are no longer prompted for every single step; instead, the design of the loop determines the next step. Quotes from industry voices illustrate this shift:

Steipete: "Instead of prompting a coding agent step by step, design a loop that prompts the agent."

Boris Cherny: "I stopped prompting Claude directly and wrote a loop that prompts Claude and decides the next step."

Addy Osmani: "Break the problem into automation, worktree, skills, plugins/connectors, sub‑agents, and memory/state."

The key change is moving the decision‑making from the human into the system.

Five Loop Types

1. ReAct Loop (think‑act‑observe)

Base version where the model decides each step from the latest observation. Flexible but suffers from growing context size and unclear failure recovery.

2. Plan‑and‑Execute Loop

The system first generates a plan and then executes it step by step. More controllable, but a wrong plan leads the agent down a faulty path.

3. Reflection / Evaluation Loop

After execution, an evaluator (tests, rules, screenshots, type checks, or a reviewer agent) validates the result. Separating executor and evaluator improves reliability.

4. Goal / Long‑Running Loop

Focuses on a persistent goal with explicit completion, verification, and constraints. Prevents the agent from losing sight of the objective over many iterations.

5. Optimization / Self‑Harness Loop

Collects failure traces, proposes Harness modifications, and runs regression tests before promoting changes. The system only adopts a modification after independent evaluation.

Production‑Grade Loop: Six Hard Boundaries

A loop that works in a demo does not automatically qualify for production. The six essential boundaries are:

Verification : External evidence (tests, lint, type checks, screenshots, link checks, review handling, artifact placement) must prove task completion.

Stop : Clear stop conditions (goal reached, budget exhausted, consecutive no‑progress) prevent endless execution.

State : Persist current goal, attempted paths, failed attempts, key evidence, blockers, next‑step plan, and human decisions outside the transient LLM context.

Recovery : Define retry limits, idempotent command policies, alternative paths for tool failures, artifact‑loss handling, and escalation thresholds.

Isolation : Explicit worktree/branch, credential scope, side‑effect locations, and cleanup procedures to avoid cross‑agent contamination.

Observability : Record why a round started, which tools were called with what parameters, returned results, agent interpretation, decision checks, and the exact Harness/Skill version used.

Martin Fowler’s advice on feed‑forward and feedback in Harness applies directly: validation must be concrete, otherwise the loop becomes “busy work”.

Loop vs. Harness vs. Environment

Clarifying the boundaries:

Harness defines how the agent runs.
Loop defines how feedback enters the next round.
Environment defines where feedback originates.

Loop does not replace Harness; it is the rhythm that drives Harness‑managed work. Without Harness, a loop is just a chat‑based repetition; without Loop, Harness is a one‑off execution.

From Loop to Structured Graphs

Pure agent loops act as a single ready‑unit scheduler, hiding dependencies and recovery policies. Structured Graph Harness extracts control flow into an explicit DAG, making dependencies, recovery, and history visible and debuggable.

Open‑Source Projects to Study

When evaluating projects, focus on the six boundaries rather than feature count. Notable examples:

Codex CLI / Codex Goals : local coding agent, persistent goals, evidence checks.

OpenHands / Agent Canvas : agent server, automation server, Docker/VM/cloud backends.

PydanticAI : type safety, dependency injection, durable execution, tracing, human approval.

OpenAI Agents SDK : agents, handoffs, guardrails, sessions, tracing, sandbox agents.

All these projects move agents from the chat box into a runnable, traceable, pausable, and recoverable environment.

First Loop Implementation

Start with a small, well‑scoped loop. Examples of low‑risk loops include:

Documentation link checking

CI failure triage

Flaky test classification

Dependency upgrade pre‑check

Issue auto‑reproduction info

PR review comment fixing

Online error daily aggregation

Each loop should have a concise contract covering name, trigger, goal, inputs, scope, tools, verification, stop conditions, escalation, state persistence, and cleanup. A sample contract for a docs‑link loop is provided in the article.

Conclusion

Loop Engineering does not eliminate prompt engineering; prompts become part of goals, skills, runbooks, state ledgers, validators, and stop conditions. The real engineering discipline is to make feedback verifiable, hand‑off‑ready, and stoppable, mirroring traditional system design practices (interfaces, state, error codes, retries, idempotency, audit, rollback) for AI agents.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI Agents ReAct observability Verification Loop Engineering Self‑Harness

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.