From Prompt Frenzy to Agent‑Driven AI Workflow: A 200k‑Line Real‑World Project Case Study
The article details a practical AI‑driven development workflow built on OpenSpec and SuperPowers for a 200,000‑line Flutter‑Node music app, explaining how dual documentation (AGENTS.md and START_HERE.md), sub‑agent review loops, and a single‑command execution model enforce strict engineering constraints, reduce hallucinations, and automate code delivery.
Origin
The author has been debugging a Flutter app with a Node backend for real‑time music practice, which involves complex cases such as tie groups, tuplets, legato, voice‑track accuracy, and clock stability. Over time a workflow was formed to constrain AI behavior and allow easy switching between models like Codex and Antigravity.
Hard Constraints and Governance: Division of START_HERE.md and AGENTS.md
To make AI write reliable code, clear specifications and boundaries must be defined. The entry point for every AI session is AGENTS.md (or CLAUDE.md for other agents). All development activities start from this file and lead to static specifications.
The project’s tree structure (image omitted) shows the separation of responsibilities.
1. AGENTS.md (or CLAUDE.md ) – Agent Session Guide
Specifies the reading order for the AI (entry file first, then capability specs).
Enforces engineering discipline such as TDD steps, logging standards, and exception handling.
Defines the OpenSpec workflow loop and sub‑agent review gate.
2. START_HERE.md – System Specification and Architecture Knowledge Graph
This file is the sole entry to the static architecture source. It does not contain executable steps; instead it lists directories and rules that define the system’s “static physical boundaries.” It points to lower‑level spec documents: architecture_ssot.md: Defines six‑layer architecture boundaries (L1 semantic input to L6 evaluation) and three unique truth sources (semantic/playback/geometric coordinates). This is the Flutter side’s architectural constraint. interface_contracts.md: Specifies type contracts and boundary logic for each layer; any new field must be added to the document before code changes. musicxml_playback_omr_capability_ssot.md: Records the linkage rules between semantic parsing and playback performance.
These documents together define “what the system is” and “what is absolutely prohibited.” If a code change touches these static boundaries, the AI must stop and request human confirmation.
Why Not Merge the Two?
Merging would cause severe context bloat and attention drift for the AI. The static architecture definitions are large, while the AI’s context window is limited. Mixing them would pollute the AI’s working memory, causing it to forget TDD rules or review gates. Separate files keep the context clean and reduce token usage.
Google Stitch’s DESIGN.md is also referenced for UI conventions such as color tokens and naming principles.
Hands‑On Walkthrough: From Brainstorm to Three‑Stage Delivery
1. Use SuperPowers to Brainstorm PRD
At the start, the user types /brainstorm in the chat, activating SuperPowers’ brainstorming toolchain. The AI, using DESIGN.md and existing design specs, discusses and outputs a complete PRD for a mixer’s core functions.
2. Three‑Stage Roadmap Planning
Alpha (Core Model & Clock) : Implements unified clock controller and playback state distribution; each stage is an OpenSpec change.
Beta (Score Interaction & Loop) : Implements measure‑click selection, A‑B looping, and visual highlight sync; each stage is an OpenSpec change.
Release (Singing Evaluation & Feedback) : Integrates microphone audio analysis, pitch evaluation overlay, and scoring cards; each stage is an OpenSpec change.
Every concrete task in each stage maps to an independent OpenSpec change.
3. OpenSpec Change Three‑Act Process
Each tasks.md entry must end with three quality gates:
Step 1 – Implementation Report : The AI fills a full implementation document, recording code diffs, modified files, test commands, and any out‑of‑scope items.
Step 2 – Sub‑agent Review Loop : The AI spawns an isolated Subagent that reviews the change against the project’s architectural specs. The Subagent must output a clear verdict; if it rejects (e.g., UI layer calling a player API directly or missing log statements), the AI must revise until a PASS is received.
Step 3 – Human Review Gate : After passing the Subagent review, the AI hands the change to the human reviewer, highlighting key experiential test points (e.g., “listen for clipping when skipping a track”). The human approves with APPROVED, after which the change is archived.
Minimalist Experience: One‑Command Driven Development Loop
After brainstorming the PRD and splitting it into Alpha, Beta, and Release stages, the developer only needs to type a single command in the chat, for example:
“Implement openspec change perf-runtime-01-model-foundation ”.
The AI then autonomously completes the entire engineering loop.
Automated Development Flow Diagram
The following diagram (image omitted) shows the internal sequence of the AI main agent, sub‑agents, testing suites, and git operations.
In this self‑contained pipeline, the human no longer tracks scattered file modifications or copies test logs. The AI uses the local test suite and sub‑agent review to filter quality before committing. After human confirmation, the AI runs git commit and git push. Subsequent CI/CD steps (platform builds, Fastlane distribution, etc.) will be added later.
After a period of practice, the author observed a significant reduction in AI hallucination and drift, though complex effects still require iterative debugging. The main drawback is high token consumption.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Tech Architecture Stories
Internet tech practitioner sharing insights on business architecture, technology, and a lifelong love of tech.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
