Deep Dive into Loop Engineering: From Prompt Engineering to System Design

Loop Engineering replaces manual prompting with system‑designed loops that let AI agents iterate autonomously, covering its definition, origins, five core modules plus memory, a full‑stack example, experimental results, limitations, and a comparison between Claude Code and Codex.

Linyb Geek Road
Linyb Geek Road
Linyb Geek Road
Deep Dive into Loop Engineering: From Prompt Engineering to System Design

1. Core Definition: What Is Loop Engineering?

Loop Engineering, defined by Google Cloud AI Engineering Director Addy Osmani, means using a system you design to prompt an agent instead of prompting the agent yourself. In other words, you write a Loop rather than a Prompt, allowing the model to work continuously—even while you sleep—by repeatedly pursuing a defined goal until it is satisfied.

2. Background: Two Cognitive Leaps at Anthropic

According to Boris Cherny, the creator of Claude Code, Anthropic engineers experienced two major shifts:

First leap (≈1.5 years ago): From writing code to writing Prompts, treating the model as a code‑generating assistant.

Second leap (ongoing): From writing Prompts to designing Loops, where a Loop orchestrates the agent instead of direct interaction.

Third leap (in progress): Toward autonomous collaboration among multiple agents, where humans only define business goals.

3. The Five Core Modules (+1 Memory Mechanism)

Addy Osmani identifies five essential components that both OpenAI Codex and Anthropic Claude Code implement, plus a persistent memory layer.

Module 1: Automation

Automation turns a Loop into a true recurring process. In Codex, tasks are created on the “Automations” tab with a project, Prompt, and schedule; results go to a triage inbox or are auto‑archived. Claude Code uses the /loop command with cron‑style intervals or lifecycle hooks, and the key /goal command runs until a user‑defined condition becomes true, delegating completion judgment to a separate small model.

Module 2: Worktree Isolation

Running multiple agents can cause file conflicts. The solution is Git worktrees: each agent works in an isolated directory on its own branch, sharing repository history but preventing cross‑writes. Codex has built‑in worktree support; Claude Code enables it with the --worktree flag and isolation: worktree configuration.

Module 3: Skill

Skills encapsulate reusable intent and context so the agent does not need a full project briefing each run. Both tools store a SKILL.md folder containing commands, metadata, optional scripts, and assets. Skills are invoked with $ or /skills. They also solidify intent, turning one‑off prompts into cumulative knowledge.

Module 4: Connectors

Connectors, built on the MCP protocol, let agents interact with issue trackers, databases, staging APIs, or Slack. Both Codex and Claude Code support MCP, so a Connector written for one often works for the other. Plugins bundle Connectors with Skills for easy distribution.

Module 5: Sub‑Agent

Separating code generation from code review improves reliability. Sub‑Agents are defined in TOML files under .codex/agents/ or .claude/agents/. A common pattern uses an Explorer, an Implementer, and a Verifier. Sub‑Agents consume more tokens but focus verification where it matters most.

+1 Memory Mechanism

A persistent markdown file or Linear board records what has been done and what remains, because large‑language models forget between runs. The memory file lives on disk, ensuring continuity even when the agent restarts.

4. A Complete Loop in Practice

Every morning an automated task runs in the repository. It invokes a Triage Skill that reads yesterday’s CI failures, open issues, and recent commits, then writes findings to a markdown file or Linear board. For each actionable item, the Loop creates a separate worktree, spawns a Sub‑Agent to draft a fix, and a second Sub‑Agent to verify the draft against project Skills and tests. Connectors automatically open PRs, update tickets, and post status to Slack. Unhandled items go to a human‑review inbox. A state file tracks progress so the next run resumes where the previous one left off.

5. Self‑Correction Experiment with Claude Fable 5

Lance Martin at Anthropic ran a “Parameter Golf” experiment: train a model on eight H100 GPUs in under ten minutes to fit within a 16 MB artifact. The Loop edited training code, launched training, polled logs, read scores, and decided the next experiment.

Key finding: having an independent verification Sub‑Agent score the output is far better than the model self‑scoring, because the scoring occurs in a separate context window. The CMA Outcomes feature automatically creates such a scoring Sub‑Agent.

Results: Fable 5 improved the training pipeline roughly six‑fold compared with Opus 4.7. Fable 5 made larger structural bets (e.g., architecture changes) and showed greater resilience, such as surviving a quantization rollback, whereas Opus 4.7 only achieved incremental scalar tweaks.

Memory usage comparison on an SQL sequential‑question task:

Sonnet 4.6 stopped after the first step, storing only failures and guesses, with little reference to prior notes.

Opus 4.7 stopped after the third step, building a partially uncertain reference model with low coverage (7‑33%).

Fable 5 completed the full path, achieving 73 % verification coverage and extracting generic rules.

6. Three Things Loops Can’t Do

Verification remains the human’s responsibility: An unsupervised Loop can still produce erroneous code; the “completed” claim is a statement, not proof.

Understanding debt grows: Faster Loop output widens the gap between generated code and the developer’s mental model unless the developer reviews the results.

Inaction is a risk: A Loop that runs without judgment may accept any output, leading to cognitive surrender. Designing Loops with built‑in judgment mitigates this.

7. Claude Code vs. Codex: Tool Comparison

Both tools share the same five core modules, differing only in naming and entry points. (Image omitted for brevity.)

8. Paradigm Shift: From Prompt Engineering to Loop Engineering

The lever moves from Prompt to Loop design. Previously, a well‑crafted Prompt yielded good results; now, the quality of the designed Loop determines output quality. The same Loop can produce vastly different outcomes for different users, depending on whether they use it to deepen understanding or to avoid it.

Designing effective Loops requires deep engineering expertise and sufficient token budget, as the system must be meticulously configured and supervised.

9. Three Engineer Levels

L1: Manually write code line by line.

L2: Write Prompts for an agent to generate code (dialogue‑based output).

L3: Design Loops that let agents iterate automatically (systematic output).

Loop Engineering is the key that moves engineers from L2 to L3.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AutomationConnectorAI agentsPrompt EngineeringGit worktreeskillSub-AgentLoop Engineering
Linyb Geek Road
Written by

Linyb Geek Road

Tech notes

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.