8 Actionable Practices from Cursor’s Week‑Long, Million‑Line Coding Experiment
Cursor ran a team of AI coding agents for a week to build a prototype browser, uncovering three major failure modes—drift, collaboration breakdown, and lack of quality signals—and proposing a planner/worker split plus eight concrete tactics that ordinary developers can adopt for long‑running autonomous coding tasks.
Running code‑writing agents for extended periods (up to a week) surfaces three recurring failure modes:
Drift : early in the run the goal is clear, but after minutes to hours agents start adding peripheral changes, and after days they lose sight of the original objective.
Collaboration failures : when multiple agents edit the same files they generate many merge conflicts, and each agent tends to perform only small, safe edits rather than tackling harder problems.
Missing quality verification signals : without tests, lint checks, or explicit acceptance criteria agents must guess when a task is complete, leading to unreliable outcomes.
Cursor’s response is to replace a monolithic “write‑code” agent with a two‑role collaboration structure that is simple, scalable, and reset‑able.
Planner : explores the repository, breaks the work into deliverable tasks, prioritises them, and can spawn sub‑planners for parallel modules.
Worker : receives a single, well‑defined task and implements it without needing to reason about global coordination.
This division reduces drift (the Planner keeps the goal aligned), prevents agents from limiting themselves to tiny safe changes (the Worker “drills through” tasks), and enables periodic restarts. After each cycle a review agent decides whether to continue; if not, the next cycle starts from a clean state, directly combating drift.
8 Actionable Practices You Can Reuse
1) Let the agent write a plan before writing code
Replace a prompt such as “Help me implement X” with “Help me break X into a deliverable plan.” Example prompt:
You should not write code yet.
Read the repository and identify the most relevant files for this requirement.
Output an implementation plan: which files to change, why, and the acceptance criteria.
Include risk points and rollback strategies.The plan surfaces direction errors early; it is not intended to speed up execution.
2) Give the agent a verifiable “completion signal”
State explicitly how the task is considered done. Common signals are:
All tests pass (green).
All lint checks pass.
The specified command’s output matches expectations.
Example completion criteria:
Completion criteria: <code>npm test</code> passes and the new tests cover X’s edge cases.
Do not stop until all tests pass.Without a clear signal the agent relies on subjective judgment, which is unreliable.
3) Let the agent search for context instead of feeding every file
Provide the goal and constraints; let the agent use repository search tools to locate relevant code. Manually specify files only when you know the exact entry point or need the agent to mimic an existing pattern.
4) When a conversation drifts, start a new one
Long dialogues accumulate noise, making the agent slower, more erratic, and over‑confident. Opening a fresh conversation resets the context.
5) Use Rules for long‑term preferences and Skills for reusable processes
Rulesstore team‑wide conventions such as fixed commands, code‑style preferences, and directory conventions. Skills encode repeatable workflows, for example:
Generating PR descriptions.
Running tests repeatedly until they are green.
Writing unit tests from a team template.
6) Adopt test‑driven development for AI agents
Follow a four‑step loop:
Agent writes a failing test (implementation is prohibited).
You verify the test is correct.
Agent writes the implementation (modifying the test is prohibited).
Repeat until all tests pass.
7) Run multiple ideas in parallel, but avoid parallel edits to the same code region
Use isolated worktrees so each agent works on a separate copy. Parallelism can be applied to:
Generating two independent implementations for the same requirement and selecting the better one.
Diagnosing a bug with two agents and comparing evidence.
Do not let multiple agents modify the same core module simultaneously, as this creates merge conflicts and wasted effort.
8) Debug tough bugs with an evidence‑driven approach
Cursor calls this “Debug Mode.” The steps are:
List several hypotheses.
Add minimal‑scope logging or instrumentation to collect data.
Based on the data, modify the code.
Prompt example:
You should not fix yet.
Propose three possible causes.
Add minimal‑scope logs to validate.
Only write the fix after obtaining runtime data.Key takeaways
Long‑running agents do not eliminate engineering management; they shift the bottleneck to decision‑making, acceptance, and quality assurance.
High quality requires explicit constraints such as tests, lint, and clear completion signals.
Periodic restarts and human oversight remain essential; agents become a scheduling and delivery mechanism rather than an autonomous code generator.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
