OpenSpec Deep Dive: 4‑Step Review and 5 Upgrades to Make AI‑Generated Code Run Correctly
This article dissects OpenSpec’s four‑step AI programming workflow, exposing why completing the process often still yields buggy code, and proposes five concrete upgrades—including spec reviews, atomic task checks, runtime verification, TDD practices, and archive gate‑keeping—to close the quality gap.
Design Intent of the Four‑Step Method
OpenSpec (GitHub: Fission-AI/OpenSpec) aligns requirements with AI before coding using the workflow
/opsx:propose → /opsx:apply → /opsx:verify → /opsx:archive. Each stage produces artifacts (proposal.md, specs/, design.md, tasks.md) that depend on one another, providing traceability from intent to implementation.
Step‑by‑Step Gap Analysis
Propose – No quality guard for Specs
The propose phase creates all four artifacts at once, but there is no mechanism to verify that Specs (behavior contracts) are complete or correctly formatted. For example, a missing “####” heading in a Delta Spec is silently ignored during archiving, causing downstream errors.
Apply – Black‑Box execution without intermediate checks
Apply executes tasks sequentially without pause. If a bug appears in task 3, subsequent tasks build on the faulty code, amplifying errors. This is likened to constructing a ten‑storey building without inspecting each floor.
Verify – Textual comparison, not runtime validation
Verify checks whether the implementation matches the Spec text and whether design decisions appear in the code, but it never runs the code. Consequently, logical errors, race conditions, performance issues, and security flaws remain undetected.
Archive – Lacks quality gate‑keeping
Archive merges Delta Specs even if Verify flagged problems, allowing buggy changes to become the new baseline.
Root‑Cause Synthesis
The workflow guarantees documentation alignment but provides no code‑level verification. Two fundamental issues arise: (1) undocumented spec quality, and (2) AI’s limited ability to translate complex designs into correct code, especially under context‑window pressure that can cause the AI to forget early decisions.
Five Upgrade Proposals
1. Spec Review after Propose
Format validation (ensure “####” headings, complete ADDED/MODIFIED/REMOVED tags).
Consistency check between Spec and original Proposal.
Boundary‑condition review for error handling and edge cases.
2. Atomic Apply tasks with checkpoints
Split apply into small tasks and insert a check after each:
apply-task-1 → check-1 → apply-task-2 → check-2 → … → apply-task-N → check-NConfirm code change matches the task description.
Verify no regression on previously completed tasks.
Run basic lint/static analysis.
3. Runtime verification in Verify
Static checks (lint, tsc --noEmit for TypeScript).
Unit tests for each new feature.
Integration tests for multi‑module changes.
Manual validation points for critical business logic.
4. Adopt TDD‑like discipline
Define acceptance criteria in Design.
Specify concrete test cases in Tasks.
Write tests before implementation in Apply.
Fail Archive if any test does not pass.
5. Pre‑Archive quality gate
All tasks completed.
Spec Review passed.
All atomic checks passed.
Runtime verification (lint, tests, type checks) passed.
No known unresolved bugs.
Practical Prioritisation
Add a Review step and manual testing after Apply – low cost, high impact.
Manually inspect Spec quality after Propose – catch format and boundary issues early.
Enforce test tasks in Tasks – make testing mandatory.
Split Apply into atomic tasks for complex changes.
Introduce the pre‑Archive gate when the workflow stabilises.
Additional Tips
Use OpenSpec’s Edit mechanism to correct Specs before re‑applying, and split large changes into multiple independent changes to reduce context‑window pressure and simplify rollback.
Conclusion
The analysis isolates why the four‑step OpenSpec workflow can finish without producing correct code: documentation‑level alignment without code‑level validation. The five upgrades form a logical quality‑closure path, though they still need real‑project validation to confirm effectiveness.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Shuge Unlimited
Formerly "Ops with Skill", now officially upgraded. Fully dedicated to AI, we share both the why (fundamental insights) and the how (practical implementation). From technical operations to breakthrough thinking, we help you understand AI's transformation and master the core abilities needed to shape the future. ShugeX: boundless exploration, skillful execution.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
