Operations 16 min read

OpenSpec Deep Dive: 4‑Step Review and 5 Upgrades to Make AI‑Generated Code Run Correctly

This article dissects OpenSpec’s four‑step AI programming workflow, exposing why completing the process often still yields buggy code, and proposes five concrete upgrades—including spec reviews, atomic task checks, runtime verification, TDD practices, and archive gate‑keeping—to close the quality gap.

Shuge Unlimited

May 9, 2026

OpenSpec Deep Dive: 4‑Step Review and 5 Upgrades to Make AI‑Generated Code Run Correctly

Design Intent of the Four‑Step Method

OpenSpec (GitHub: Fission-AI/OpenSpec) aligns requirements with AI before coding using the workflow

/opsx:propose → /opsx:apply → /opsx:verify → /opsx:archive

. Each stage produces artifacts (proposal.md, specs/, design.md, tasks.md) that depend on one another, providing traceability from intent to implementation.

Step‑by‑Step Gap Analysis

Propose – No quality guard for Specs

The propose phase creates all four artifacts at once, but there is no mechanism to verify that Specs (behavior contracts) are complete or correctly formatted. For example, a missing “####” heading in a Delta Spec is silently ignored during archiving, causing downstream errors.

Apply – Black‑Box execution without intermediate checks

Apply executes tasks sequentially without pause. If a bug appears in task 3, subsequent tasks build on the faulty code, amplifying errors. This is likened to constructing a ten‑storey building without inspecting each floor.

Verify – Textual comparison, not runtime validation

Verify checks whether the implementation matches the Spec text and whether design decisions appear in the code, but it never runs the code. Consequently, logical errors, race conditions, performance issues, and security flaws remain undetected.

Archive – Lacks quality gate‑keeping

Archive merges Delta Specs even if Verify flagged problems, allowing buggy changes to become the new baseline.

Root‑Cause Synthesis

The workflow guarantees documentation alignment but provides no code‑level verification. Two fundamental issues arise: (1) undocumented spec quality, and (2) AI’s limited ability to translate complex designs into correct code, especially under context‑window pressure that can cause the AI to forget early decisions.

Five Upgrade Proposals

1. Spec Review after Propose

Format validation (ensure “####” headings, complete ADDED/MODIFIED/REMOVED tags).

Consistency check between Spec and original Proposal.

Boundary‑condition review for error handling and edge cases.

2. Atomic Apply tasks with checkpoints

Split apply into small tasks and insert a check after each:

apply-task-1 → check-1 → apply-task-2 → check-2 → … → apply-task-N → check-N

Confirm code change matches the task description.

Verify no regression on previously completed tasks.

Run basic lint/static analysis.

3. Runtime verification in Verify

Static checks (lint, tsc --noEmit for TypeScript).

Unit tests for each new feature.

Integration tests for multi‑module changes.

Manual validation points for critical business logic.

4. Adopt TDD‑like discipline

Define acceptance criteria in Design.

Specify concrete test cases in Tasks.

Write tests before implementation in Apply.

Fail Archive if any test does not pass.

5. Pre‑Archive quality gate

All tasks completed.

Spec Review passed.

All atomic checks passed.

Runtime verification (lint, tests, type checks) passed.

No known unresolved bugs.

Practical Prioritisation

Add a Review step and manual testing after Apply – low cost, high impact.

Manually inspect Spec quality after Propose – catch format and boundary issues early.

Enforce test tasks in Tasks – make testing mandatory.

Split Apply into atomic tasks for complex changes.

Introduce the pre‑Archive gate when the workflow stabilises.

Additional Tips

Use OpenSpec’s Edit mechanism to correct Specs before re‑applying, and split large changes into multiple independent changes to reduce context‑window pressure and simplify rollback.

Conclusion

The analysis isolates why the four‑step OpenSpec workflow can finish without producing correct code: documentation‑level alignment without code‑level validation. The five upgrades form a logical quality‑closure path, though they still need real‑project validation to confirm effectiveness.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

automation workflow code quality AI programming test‑driven development OpenSpec spec review

Written by

Shuge Unlimited

Formerly "Ops with Skill", now officially upgraded. Fully dedicated to AI, we share both the why (fundamental insights) and the how (practical implementation). From technical operations to breakthrough thinking, we help you understand AI's transformation and master the core abilities needed to shape the future. ShugeX: boundless exploration, skillful execution.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.