Loop Engineering: When AI Coding’s Bottleneck Shifts from Prompt to Loop
The article argues that single‑call AI agents have hit their performance ceiling and that the next frontier, called Loop Engineering, moves the heavy lifting from prompt design to automated, self‑checking loops, while outlining real‑world attempts, core components, and practical limitations.
Single‑call agents have reached their ceiling; the emerging shift is to extract humans from the loop—letting AI run, verify, and deliver itself. Boris Cherny’s claim, “I no longer prompt Claude; my job is to write loops,” captures this architectural move.
Many are currently obsessed with Harness Engineering—optimizing prompts, frameworks, and RAG to push a single agent’s score from 60 to 90. However, the remaining 10 points cannot be gained by stronger models; the limitation lies in the single‑call interaction model itself.
The End of Single Calls Is Loops
When tasks become slightly complex, goals blur and solutions diverge, making a single call unlikely to produce a perfect output.
Adding a second agent as a reviewer improves correctness but creates a mechanical human‑in‑the‑loop SOP: assign task → agent works → you accept → you give feedback → agent retries → you accept again, and so on. The human becomes the rope that ties all these nodes together, exhausting effort on node transitions.
Automating this chain of node transitions defines the new term “Loop Engineering.”
The true entry point for coding is no longer the prompt, but the loop itself.
Addy Osmani’s Definition
The term gained traction after Google Chrome engineering director Addy Osmani’s June 7 blog post.
Loop engineering is replacing yourself as the person who prompts the agent. You design the system that does it instead.
He also quoted Anthropic Claude Code lead Boris Cherny: “I no longer prompt Claude. I have a running loop that prompts Claude and decides what to do next. My job is to write the loop.”
Cherny reportedly merges 150 pull requests a day entirely on his phone, without writing a single line of code himself.
A loop can be thought of as a recursive goal where you define a purpose and the AI iterates until complete.
Thus, while Harness Engineering focuses on improving a single agent’s call, Loop Engineering focuses on embedding that agent in a self‑aware, self‑correcting closed loop so humans no longer need to intervene.
Four Industry Attempts
1. Anthropic’s /loop: Scheduling an Agent
Anthropic’s internal repository receives dozens of GitHub Issues daily. Engineers previously copied issue data, fed it to Claude Code, inspected results, tested, and submitted PRs—a classic manual pipeline.
They built an internal tool called /loop that attaches the agent to a cron job: every hour it fetches new issues, reads code, fixes bugs, runs tests, and automatically opens a PR. After five consecutive failures it logs the reasoning and exits.
A sibling tool /goal runs not on a timer but when a “goal condition” is met—e.g., when test/auth passes and lint is clean. After each run, an independent small model judges completion, separating the creator from the judge.
Creator and judge become physically isolated from here.
2. Karpathy’s Verifier: Ultra‑High‑Frequency Trial‑and‑Error
In his open‑source project, Karpathy added a Verifier role to the training loop. Instead of humans watching loss, tweaking hyper‑parameters, and launching the next run, the verifier automatically scores each round, writes feedback, and feeds it back into the next iteration.
He measured a 11 % efficiency gain after running 700 automated experiments over two days, with zero human intervention.
Humans have a daily limit on experiments; loops do not.
3. Codex’s /goal: TDD‑Based Judge Isolation
Codex engineers were frustrated by “task‑completion hallucinations” where the AI confidently claimed completion while the code failed to compile.
Their /goal approach embeds test‑driven development into an autonomous loop: tests and acceptance criteria are written first; the agent writes code, runs tests, rewrites on failure, and only declares success when tests pass.
This forces a deterministic separation between the “player” (code writer) and the “referee” (test suite).
4. Dynamic Workflow: AI‑Decided Loop Paths
The previous three attempts hard‑code the loop path. When the process changes slightly, the static loop breaks.
Dynamic Workflow removes hard‑coded flow and hands decision‑making to the AI at runtime. For a given task, the AI selects a pattern based on current state:
Unclear goal → “race” mode where several sub‑agents compete.
Long steps → “classify‑then‑act” mode that fragments the task.
Mid‑step failure → dynamically assemble a “verifier” to retry.
The system decides the next step, loop count, and which capabilities to invoke based on live feedback.
From a fixed infinite loop to AI‑driven adaptive actions.
Seven Components of a Working Loop
Addy’s blog lists five pieces plus a memory; the Chinese community expands this to seven dimensions:
1. Automations – The system must sense task start without a manual “run” button. Claude Code uses /loop, cron, lifecycle hooks, and GitHub Actions to keep the heartbeat alive.
2. Schedulers – Precise decomposition of a large task into dependent subtasks determines loop granularity.
3. SubAgents – Claude Code stores sub‑agents under .claude/agents/, Codex under .codex/agents/. Sub‑agents share context, can claim responsibilities, and score themselves, achieving creator‑judge separation.
4. Worktrees – Parallel agents need isolated checkouts. Git worktrees provide separate file trees while sharing repository history; Claude Code uses the --worktree flag and isolation: worktree setting.
5. Verifiers + State – Long‑running tasks need persisted state because agents forget after restart. State can be stored in Markdown, Linear boards, or any durable store.
Agents forget; repositories do not.
6. Connectors – Beyond file‑system loops, MCP‑based connectors let agents read issues, query databases, invoke test APIs, or post to Slack. Loop value is judged by how many real tools it can reach.
7. Skills – A SKILL.md folder contains project background, conventions, and known pitfalls. The loop reads these each run, providing project‑specific knowledge without relying on volatile state.
All seven pieces must be present; missing any forces manual intervention.
Before You Write a Loop: Open Questions
The narrative is smooth, but several gaps deserve attention.
The “150 PRs on a phone” number reflects the tool author’s ideal environment, where the repository is clean and tuned for loops. It offers limited reference for most legacy codebases.
Most teams cannot adopt loops because their repositories are not ready; the bottleneck is not writing loops but whether the codebase can support them.
Workload shift – Loops move effort from prompt engineering to maintaining Skills, connectors, verification scaffolding, and state files. Skills become stale, connectors break, verification conditions get overridden, and state drifts from code.
For teams with non‑repetitive, hard‑to‑automate tasks, the setup cost outweighs the benefit; you merely replace prompt‑writing effort with loop‑maintenance effort.
Creator‑judge separation – Both agents are LLMs trained on the same data, sharing blind spots and over‑confidence. They can catch simple typos but will likely both approve a fundamentally wrong direction.
Verification pitfalls – Automatic success does not equal correctness. If the stop condition is “tests pass, lint clean,” Goodhart’s law kicks in: the loop may relax assertions, inject mocks, or swallow exceptions to satisfy the metric.
Thus you may end up measuring whether the checker is silent rather than whether the code is correct.
Value of engineering work – Decision‑making, architecture, and trade‑offs are precisely what loops cannot handle. Automating repetitive code generation creates a flood of low‑value PRs that still require human review.
When code generation outpaces code reading, the bottleneck moves from writing to reviewing, potentially overwhelming reviewers with merge queues.
The Lever Moves, Work Doesn’t Lighten
Both Harness and Loop engineering improve task completion rates, but ultimate responsibility remains with the human.
Uncontrolled automation leads to chaos; retaining critical manual checkpoints yields a more stable human‑AI collaboration.
The effort shifts from “how to write prompts” to “how to design loops.” Designing loops is harder; two identical loops can produce opposite outcomes depending on repository health and skill freshness.
Loops reward those who already understand the problem; expecting loops to think for you often results in amplified mistakes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code Mala Tang
Read source code together, write articles together, and enjoy spicy hot pot together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
