How Warp’s Dual‑Loop Architecture Enables Self‑Improving AI Agents
The article explains Warp founder Zach Lloyd’s practical dual‑loop system that lets AI agents automatically learn from human‑corrected feedback, continuously refining their task‑execution skills through an inner execution loop and an outer improvement loop.
What is a self‑improvement loop?
The core idea is simple: an Agent should not be a one‑time tool but should learn from the feedback of each execution, much like a human improves over time.
Example: an Issue‑classification Agent that routes new GitHub Issues into three buckets – ready‑to‑implement , duplicate , and needs‑info . The first version of its Skill covers about 70% of cases, and traditionally developers manually correct mistakes and edit the Skill code.
How the dual‑loop is built
Inner Loop (the working Agent)
This Agent actually performs the task. For Issue classification, its workflow is:
GitHub receives a new Issue.
A GitHub Action is triggered.
The action calls Warp’s cloud Agent platform, Oz.
The Agent reads the Skill file (a markdown document containing classification rules).
The Agent applies labels to the Issue.
All execution records are stored – as log files, Slack messages, or comments on the Issue.
Outer Loop (the improvement Agent)
This supervisory Agent does not handle Issues directly; it runs periodically (e.g., daily) to evaluate the Inner Loop’s performance.
Its steps are:
Collect all Issues processed by the Inner Loop.
Identify those whose labels were manually corrected.
Read the human‑provided rationale (e.g., a comment explaining why the label was changed).
Generate a diff that updates the Skill file based on the feedback.
Submit a PR; once merged, the Inner Loop uses the updated Skill on the next run.
The key insight is that a Skill is just a file, so it can be edited programmatically, go through a Git workflow, and be continuously optimized.
Real‑world case: manual correction triggers improvement
Suppose the Inner Loop labels an Issue as “ready‑to‑implement,” but a reviewer sees that the request is ambiguous and changes the label to “needs‑info,” adding a comment:
This feature’s scope is unclear; the author should specify whether it is a global setting or project‑level.
The next day the Outer Loop runs and:
Detects the label change.
Reads the reviewer’s comment.
Infers that “unclear scope” is a new decision dimension.
Updates the Skill to add a rule: “If a new feature is mentioned without a clear scope, label as needs‑info.”
After the diff is merged, the Inner Loop will no longer repeat the same mistake.
Why this architecture matters
Traditional Agent improvement relies on developers manually tweaking prompts, rules, or adding few‑shot examples, which is labor‑intensive and prone to missing edge cases.
The dual‑loop approach offers four advantages:
Feedback from real scenarios – improvements are driven by production data rather than imagined test cases.
Incremental updates – each change addresses a single error, similar to small Git commits, reducing the risk of regressions.
Shifted human role – developers only need to comment on problems; the Outer Loop translates those comments into code changes.
Potential for unattended operation – with a clear metric (e.g., test pass rate or user satisfaction), the Outer Loop could run fully automatically.
Where the architecture can be applied
Code‑review Agent : learns from manual edits to review suggestions.
Bug‑fix Agent : learns from human‑corrected fixes to improve root‑cause identification.
Incident‑response Agent : learns from historical incident handling to refine runbooks.
Two conditions are required:
Execution records must be traceable (logs, comments, Slack messages).
Clear feedback signals must exist (manual corrections, test results, performance metrics).
Technical implementation details
Warp uses its own Oz platform, but the concept is tool‑agnostic. The core components are:
Skill file : a markdown or other text file that encodes task instructions.
Inner‑loop trigger : a GitHub Action, cron job, or webhook that starts the Inner Loop.
Outer‑loop scheduler : a periodic script that reads feedback, creates diffs, and opens PRs.
If you use frameworks like LangChain or AutoGPT, you can store the Skill as a config file and have a simple Python script scan Git commits for human edits, then call an LLM to generate the updated config.
Zach linked an example repository (Early version here) for those who want to see the code.
Some reflections
The pattern resembles continuous integration in software engineering: small, frequent changes are quickly validated and merged, avoiding large, risky refactors.
However, unlike code correctness, the Agent is validating the reasonableness of judgments , which lacks an absolute standard and must be approximated from feedback.
The dual‑loop essentially automates “experience accumulation.” Humans no longer need to re‑teach the Agent each time; the Agent remembers each lesson.
A potential risk is biased human feedback – differing interpretations of “needs‑info” could lead to conflicting rules. Mitigation requires a rule‑merging mechanism or periodic human review of the Outer Loop’s changes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code Mala Tang
Read source code together, write articles together, and enjoy spicy hot pot together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
