Loop Engineering: The Next Evolution Beyond Harness Engineering in AI Coding
The article introduces Loop Engineering as a new AI coding paradigm that builds on Harness Engineering, explains its primitives, contrasts it with cron‑style automation, outlines suitable use cases, and provides a practical checklist for engineers to adopt reliable, context‑aware agent loops.
From Prompt to Loop
Prompt Engineering focuses on how to phrase a single model call, including role definition, output format, few‑shot examples, tone, and constraints, with a human judging each response. Context Engineering ensures the model sees the right material—RAG, repo maps, AGENTS.md, memory, progress files, and progressive skill disclosure—yet the model can still drift.
Harness Engineering, described in Addy Osmani’s April article, provides the runtime scaffolding around an agent: prompts, tools, context policies, hooks, sandbox, sub‑agents, feedback loops, and recovery paths, enabling the model to act reliably in an engineered environment.
Loop Engineering sits one layer above Harness. It coordinates a set of agents or automated tasks to continuously discover work, assign tasks, verify results, and persist state.
What Loop Actually Is
Addy breaks Loop Engineering into six primitives: automations, worktrees, skills, plugins/connectors, sub‑agents, and state. Together they form a loop capable of sustained operation.
Claude Code now includes a /loop command in its scheduled‑tasks documentation, allowing natural‑language task descriptions such as “every 5 minutes check my PR build, and if it fails read the logs and fix it,” or direct interval and expiry settings via /loop.
Developers Digest compares this to an external cron: each claude -p call is a cold start with no memory, no persistent MCP connection, and no tool context.
Claude Code’s loop lives within the current session, inheriting context, tool permissions, MCP, and skills, and can continue reasoning based on the previous round’s outcome. It also supports a dynamic mode where Claude decides the next wake‑up time based on observed state.
Codex shows a similar shape: automations discover and triage on a schedule, /goal runs until completion, worktrees isolate parallel tasks, skills capture project knowledge, and connectors integrate external systems. Though tool names differ, the architecture is comparable.
Unlike cron, which only decides *when* to run, a loop must decide *what* to do next based on observed state.
Harness Control Plane
Harness is the agent’s execution environment, packaging the model with tools, context, permissions, state, hooks, sandbox, logging, and recovery paths so it can act on real engineering infrastructure.
Loop sits above harness, orchestrating when to start an agent, which agent to assign, how to isolate parallel work, which external state to read, what acceptance criteria to apply, and when to stop.
Martin Fowler splits a coding‑agent harness into guides (pre‑action feed‑forward controls such as rules, skills, architecture docs, context injection) and sensors (post‑action feedback controls like tests, lint, type‑check, browser regression, logs, static analysis).
Anthropic’s long‑running harness follows the same pattern, dividing tasks into planner, generator, and evaluator. The planner expands requirements into verifiable contracts, the generator implements them, and the evaluator checks real‑world outcomes (pages, APIs, databases) and feeds failures back.
The diagram illustrates the control plane hierarchy: user goal → planner → execution agent → tool layer → sensors → state layer, with hooks intercepting the lifecycle before tool calls, after tool calls, and at session termination.
Putting harness and loop together clarifies boundaries: harness ensures reliability of a single action (appropriate tools, clean context, safe commands, test feedback, isolated workspace); loop ensures reliability of continuous progress (new tasks, waiting for external state, spawning helpers, advancing rounds, writing results back to issues, progress files, or triage inboxes).
Without harness, a loop merely repeats model calls; without loop, harness still requires human triggering, assignment, and acceptance.
Loop Is Not Cron
Cron suits fixed mechanical jobs—daily scripts, hourly file syncs, nightly reports—where no context understanding is needed.
GitHub Actions fit cloud‑reliable tasks such as CI/CD, PR checks, and scheduled security scans, providing logs, permission boundaries, and persistence across machine restarts.
Loop excels at ad‑hoc automation within a development session: waiting for CI, review, deployment, or long tests, monitoring state changes, reading feedback, and taking the next step.
Its advantage is that context remains alive; its drawback is that the session ends when the window closes, long‑running loops incur token costs, and permissions/tools must be tightly controlled.
Therefore, loops are not meant for long‑running background jobs; for tasks that need days or weeks of reliability, use Actions, cloud routines, or dedicated automation systems.
Tasks That Fit Loop Well
To decide if a task merits a loop, evaluate three criteria: clarity of feedback, narrow boundaries, and recoverable failure.
CI‑based fixes are ideal—each round reads failure logs, applies a fix, reruns tests, and yields clear results.
PR babysitting also fits: review comments, merge conflicts, and CI status are external signals an agent can continuously read and act upon.
Dependency upgrades, documentation sync, log inspection, and post‑long‑test failure analysis are similarly suitable because each round generates new evidence.
Conversely, large, vague re‑architectures or UX improvements lack concrete external acceptance criteria, causing loops to run many rounds without meaningful progress.
Operations that modify production permissions, payment flows, or security chains must be handled cautiously; loops that can send messages, edit issues, push branches, query databases, or deploy services essentially automate external actions and should default to read‑only, draft, or PR‑only modes, with human confirmation when needed.
Engineers’ New Role
Boris Cherny notes the shift: from writing code, to prompting models to write code, to now designing systems that keep agents working continuously.
Such systems require clear goals, toolsets, external state, validators, and stop conditions. Prompts remain useful but are embedded in a larger feedback loop.
Open‑source harness suites are already addressing this: oh‑my‑openagent on OpenCode orchestrates multiple agents, model routing, Oracle verification, and edit protocols; oh‑my‑claudecode leverages Claude Code plugins, skills, sub‑agents, hooks, and the .omc state file to manage lifecycles. Projects like ECC and GSD package cross‑harness methodology—rules, skills, evaluation, worktrees, fresh context, and goal‑backward verification—as transferable assets.
The key difference among these projects is not the number of agents but the control plane: how work is divided, verified, isolated, recovered, and when execution stops.
Practical Checklist
Use this simple checklist to decide whether to automate a task with a loop:
Can the task be automatically verified?
Can failures be rolled back?
Are the token and compute costs controllable?
Are the boundaries well defined?
If any answer is no, first add tests, acceptance criteria, reduce permissions, or introduce planning gates before automating.
Avoid forcing every workflow into a loop; low‑information‑gain cycles waste time, tokens, and generate noise.
High‑value loops should drive decisions: does the round’s result tell you to continue, stop, roll back, or hand off to a human?
Summary
Loop Engineering may be a fleeting buzzword or evolve under a new name, but the shift is real: AI coding competition moves from “can the model write code?” to “can the system keep agents progressing within real engineering constraints?”
The real learning focus is not endless agent execution but building verifiable loops: clear goals, reliable sensors, constrained permissions, and hard stop conditions.
Teams should start with a small closed loop—CI fixes, PR comment handling, log inspection, dependency upgrades—run the goal, tools, checks, stop, and rollback end‑to‑end, then expand scope.
Future engineers will increasingly act as harness designers: coding remains, but greater value lies in crafting environments, feedback mechanisms, and control planes for AI agents.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
