From Task Cycles to a Maintainable, Observable, Replayable Agent Loop
The article explains how Loop Engineering turns multi‑round Agent execution into a maintainable, observable, and replayable closed‑loop by defining six core components, reusing traditional development patterns, presenting a CI‑failure triage demo, and highlighting architectural and practical pitfalls.
TL;DR
Loop behaves like a task runtime rather than a longer prompt.
A usable Loop must expose State, Intent, Action, Verify, Commit, and Trace.
State machines, job runners, CI pipelines, and front‑end state flows clarify Loop design.
The first version works well as a small CI‑failure triage loop: read more, write less, keep evidence explicit.
Six components of a Loop
State : stores the current facts of the task. Common mistake : relying only on chat‑context memory.
Intent : decides the next step. Common mistake : letting the model change while thinking.
Action : accesses external systems. Common mistake : granting overly broad tool permissions or missing a whitelist.
Verify : checks the credibility of a result. Common mistake : the executor self‑approves.
Commit : writes the final result to real systems. Common mistake : mixing candidate and final results.
Trace : records what happened each round. Common mistake : keeping only the final summary.
Figure 1: Loop’s six components
Applying traditional development experience
Job runner – explicit state
Async jobs usually track more than running and done. A minimal state machine can be:
pending -> running -> retrying -> succeeded
running -> failed -> retrying
running -> blocked -> needs_human
retrying -> failed_permanentlyAgent Loops need a similar state machine; otherwise the system only knows that the Agent is still running, not what it has accomplished.
CI pipeline – artifact per step
Each CI step leaves an artifact (commit, job, log, report). A Loop must record evidence for every round: which files were read, which commands ran, what errors occurred, and why a decision was made.
Front‑end state flow – separate candidate actions from commits
In complex UI, the view is derived from state, then actions are batched and submitted. The same pattern applies: let the model generate candidate actions (plan, diff, comment draft) and let a controlled executor perform side‑effects such as writing files or opening PRs.
Minimal demo: CI failure triage Loop
The demo reads a failing CI job, its logs, the related PR and recent commit, classifies the failure, and produces an evidence‑backed suggestion. If evidence is insufficient or the failure is permission‑related, it hands off to a human.
{
"runId": "ci-triage-20260626-001",
"goal": "triage failing CI jobs",
"phase": "collecting",
"attempt": 0,
"maxAttempts": 2,
"evidence": [],
"classification": null,
"proposal": null,
"handoffReason": null
}Phase type definition (escaped generics):
type Phase =
| "collecting"
| "classifying"
| "drafting"
| "verifying"
| "ready_to_commit"
| "done"
| "needs_human";Reducer drives state transitions based on events:
function reduce(state: LoopState, event: Event): LoopState {
switch (event.type) {
case "EVIDENCE_COLLECTED":
return { ...state, phase: "classifying", evidence: [...state.evidence, ...event.evidence] };
case "CLASSIFIED":
if (event.classification === "permission_failure") {
return { ...state, phase: "needs_human", classification: event.classification, handoffReason: "Permission failure requires human review" };
}
return { ...state, phase: "drafting", classification: event.classification };
case "PROPOSAL_DRAFTED":
return { ...state, phase: "verifying", proposal: event.proposal };
case "VERIFIED":
return { ...state, phase: "ready_to_commit" };
case "VERIFICATION_FAILED":
if (state.attempt + 1 >= state.maxAttempts) {
return { ...state, phase: "needs_human", handoffReason: event.reason };
}
return { ...state, phase: "collecting", attempt: state.attempt + 1 };
case "COMMITTED":
return { ...state, phase: "done" };
case "HANDOFF":
return { ...state, phase: "needs_human", handoffReason: event.reason };
default:
return state;
}
}Intent selection maps the current phase to the next intent:
function selectIntent(state: LoopState): Intent {
switch (state.phase) {
case "collecting": return { type: "COLLECT_EVIDENCE" };
case "classifying": return { type: "CLASSIFY" };
case "drafting": return { type: "DRAFT_PROPOSAL" };
case "verifying": return { type: "VERIFY" };
case "ready_to_commit": return { type: "COMMIT" };
case "done":
case "needs_human":
return { type: "STOP" };
}
}Main loop runs a bounded number of steps, stores each state, selects an intent, performs the effect, and reduces the resulting event back into state:
async function runLoop(env: Env, initialState: LoopState): Promise<LoopState> {
let state = initialState;
for (let step = 0; step < 12; step++) {
await env.store.append(state);
const intent = selectIntent(state);
if (intent.type === "STOP") return state;
const event = await env.effects.perform(intent, state);
state = reduce(state, event);
}
return reduce(state, { type: "HANDOFF", reason: "Loop exceeded step limit" });
}The effects.perform function is the only place that accesses external tools; it can read CI logs, classify failures, generate suggestions, and write comments, but it never bypasses the state machine to modify final results directly.
Side‑effect management (front‑end view)
Read files, logs, issues : allowed by default, source must be recorded.
Generate plan/diff/comment draft : candidate results only, no direct commit.
Write docs, open candidate PR : low‑risk writes, require reviewable diff.
Change permissions, deploy, delete data : human confirmation required .
Retry external API : rate‑limit, back‑off, and max‑attempt limits.
Architecture: separate control plane and execution plane
The control plane should be stable, predictable, and auditable; the execution plane can be flexible and plug in different tools per task. This mirrors platform designs where a scheduler does not perform business logic directly.
Choosing the first Loop scenario
CI failure triage : clear input, log references, easy human review.
Document command validation : readable files, runnable commands, clear failure evidence.
PR risk pre‑check : explicit diff, can output a candidate risk list.
Dependency upgrade impact scan : limited to directories/packages, suitable for report generation.
Changelog candidate generation : read‑heavy, write‑light, result editable by humans.
All share the principle: read more, write less, keep evidence clear, and hand off on failure.
Common pitfalls
Using chat history as the sole state store
Chat context compresses over long tasks, losing details. Store structured state in markdown, issues, databases, or event logs to enable replay and reconciliation.
Letting the Agent decide all permissions
Model can suggest actions, but permission boundaries must be encoded in the system (read‑only vs write‑allowed paths, API‑only calls, actions requiring human approval).
Lack of independent verification
Code tasks have tests and builds; document tasks need link checks; operational tasks need source citations and approval flows. Without external verification the Loop may treat a claim of completion as actual completion.
Ignoring the process trace
Trace data reveals mis‑classifications, flaky tools, and overly permissive rules, guiding future improvements to skills, memory, tools, and prompts.
Relation to previous articles
Earlier pieces introduced Harness (the shell that runs the Agent) and Environment (the world the Agent sees). Loop Engineering sits inside this environment as the task runtime that structures state, actions, verification, and commits.
Conclusion
Loop Engineering adds an Agent to familiar loop concepts. By first providing an explicit skeleton—state, intent, action, verification, commit, and trace—a Loop becomes maintainable, observable, and replayable, solving many real‑world problems even before any advanced AI tricks are added.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
