Artificial Intelligence 14 min read

Why Spec‑First AI Coding Falls Short: Lessons from OpenAI’s Symphony

The article critically examines the hype around Agentic Coding by showing that detailed specifications are essentially code, using OpenAI’s Symphony as a case study, and demonstrates that spec‑driven generation is unstable, error‑prone, and often no faster than writing code directly.

High Availability Architecture

Mar 23, 2026

Why Spec‑First AI Coding Falls Short: Lessons from OpenAI’s Symphony

Misconceptions about Specs

Spec is simpler than code. Proponents of “Agentic Coding” claim that engineers can act only as managers who write a specification and let agents generate the implementation. This assumes that describing work is cheaper than doing it. In practice a precise spec must encode the same amount of detail as the code it produces, so its complexity is comparable to the implementation.

Writing a spec is inherently more thoughtful than writing code. The argument is that a spec forces careful design. However, industry pressure to ship quickly often leads to shallow, incomplete specs that are hard to maintain, negating any supposed quality benefit.

Spec as Code: OpenAI Symphony Example

OpenAI’s Symphony project advertises that the entire system can be generated from a single SPEC.md. The spec reads like pseudo‑code and contains large structured data blocks, concurrency control logic, retry policies, and even full function definitions.

session_id (string, <thread_id>-<turn_id>)
thread_id (string)
turn_id (string)
codex_app_server_pid (string or null)
last_codex_event (string/enum or null)
last_codex_timestamp (timestamp or null)
last_codex_message (summarized payload)
codex_input_tokens (integer)
codex_output_tokens (integer)
codex_total_tokens (integer)
last_reported_input_tokens (integer)
last_reported_output_tokens (integer)
last_reported_total_tokens (integer)
turn_count (integer)

Concurrency control is expressed as a code snippet:

available_slots = max(max_concurrent_agents - running_count, 0)

Retry and back‑off policies are also written in code form, and the startup routine is presented as a full function:

function start_service():
  configure_logging()
  start_observability_outputs()
  start_workflow_watch(on_change=reload_and_reapply_workflow)
  state = {
    poll_interval_ms: get_config_poll_interval_ms(),
    max_concurrent_agents: get_config_max_concurrent_agents(),
    running: {},
    claimed: set(),
    retry_attempts: {},
    completed: set(),
    codex_totals: {input_tokens: 0, output_tokens: 0, total_tokens: 0, seconds_running: 0},
    codex_rate_limits: null
  }
  validation = validate_dispatch_config()
  if validation is not ok:
    log_validation_error(validation)
    fail_startup(validation)
  startup_terminal_workspace_cleanup()
  schedule_tick(delay_ms=0)
  event_loop(state)

Because the spec reads like executable code, it cannot be considered a simpler alternative to writing code directly.

Instability of Spec‑Driven Generation

Even when a spec is treated as a literal blueprint, the generation process is unreliable. An attempt to implement Symphony in Haskell using Claude Code resulted in numerous bugs and agents that stalled on a simple Linear ticket. The full attempt and its commit history are available in the repository:

https://github.com/Gabriella439/symphony-haskell

Typical failure modes included:

Compilation errors that required repeated manual prompts to the AI model.

Runtime agents that entered a no‑progress loop, repeatedly creating empty Git repositories without advancing the ticket.

Similar instability is observed in well‑known specifications such as the YAML spec, where extensive test suites still cannot guarantee full compliance across implementations.

AI‑Generated Noise in the Spec

Parts of the Symphony spec contain sections that appear to be filler generated by AI, lacking coherent narrative. An example is the linear_graphql extension contract, which lists fields and validation rules without clear purpose:

{
  "query": "single GraphQL query or mutation document",
  "variables": {
    "optional": "graphql variables object"
  }
}

The surrounding description enumerates requirements (non‑empty query, single operation, optional variables, etc.) but offers no higher‑level design insight, illustrating how a spec can become “garbage” when rushed.

Conclusion

A sufficiently detailed specification is effectively code; it does not reduce the amount of engineering work required. Writing code directly is often more efficient than first producing a spec that is as complex as the implementation. The “garbage‑in, garbage‑out” principle applies: an ambiguous or incomplete spec cannot magically yield a correct program, and AI agents lack the intuition to fill in missing design intent.