Artificial Intelligence 23 min read

Why Spec-Driven Development Remains Crucial in the Age of Harness Engineering

The article analyzes Harness Engineering and Spec‑Driven Development (SDD), explaining how structured specs act as a map, semantic foundation, and correctness criterion for AI agents, and why investing in a robust spec system still adds decisive value despite powerful harness tools.

Tencent Cloud Developer

Mar 31, 2026

Why Spec-Driven Development Remains Crucial in the Age of Harness Engineering

TL;DR

- Harness Engineering and SDD are two layers of the same concept. - Engineering discipline shifts from writing code to building scaffolding that agents can reliably use. - Specs become the core content amplified by a strong harness.

01 What is Harness?

Mitchell Hashimoto’s February 2026 blog defines the fifth stage of AI adoption as “Engineer the Harness”: whenever an agent makes a mistake, engineers create a permanent solution so the mistake never recurs. He records bad agent behaviors in an AGENTS.md file (e.g., Ghostty’s file where each line represents a real failure that was subsequently fixed) and builds custom tools (screenshots, filter tests) that let agents verify their own work.

02 OpenAI’s View

“The obvious truth is that building software still requires discipline, but the discipline now lives in the scaffolding rather than the code. Tools, abstractions, and feedback loops that keep the codebase consistent become ever more important.” “Our toughest challenge today is designing environments, feedback loops, and control systems that let agents achieve our goals at scale.”

OpenAI’s official Harness Engineering blog (Feb 2026) describes a five‑month experiment: starting from an empty repo, they generated ~1 M lines of code and 1 500 PRs entirely by agents. The agents wrote code, while humans focused on product specs, design constraints, and observability tooling that the agents could consume.

03 Spec’s Three Roles in Harness

Role 1 – A Map for Agent Reasoning

Agents need a navigable directory, not a 1 000‑page manual. System‑level specs describe services, boundaries, and responsibilities; service‑level specs detail capabilities, API semantics, and behavior rules. The agent starts from a high‑level overview and drills down only to the parts it needs.

Role 2 – Semantic Foundation for Constraints

Linter‑style checks enforce format rules (file size, naming), but they cannot verify meaning. Semantic constraints—error‑code contracts, cross‑service field meanings, state‑flow rules—must be written explicitly in the spec. When a service A defines an error code that service B’s agent does not understand, the agent silently guesses, leading to bugs. Adding the missing contract to the spec lets the same agent generate correct code without changing the model.

Role 3 – Correctness Criterion for the Feedback Loop

The harness’s core flywheel is: Agent error → diagnosis → engineered fix → agent never repeats the error. To close the loop, you must know the agent erred, which requires a testable correctness criterion. Structured WHEN/THEN scenarios in the spec provide that criterion, allowing automated validation of generated code against the spec.

04 Why Spec‑Driven Development (SDD) Still Matters

Spec quality directly caps agent output quality: the better the input, the higher the probability of correct output. SDD improves three layers of value:

Capability level: precise WHEN/THEN scenarios keep agents within defined behavior bounds.

System‑level integration: shared contracts let multiple services’ agents see the same semantics.

Continuous correctness: automated spec validation provides a regression baseline, preventing drift.

OpenAI’s own experience shows that without specs, cross‑service ambiguities surface only during integration or after release, inflating total delivery cost. Writing specs up‑front eliminates that hidden cost.

05 Practical Insights

Human attention is the scarcest resource – focus reviews on specs, not on generated code. A well‑reviewed spec reduces downstream code‑review effort because most semantic issues are caught early.

Large AGENTS.md files are a trap . Mixing execution constraints (e.g., “do not use API X”) with semantic constraints inflates the file, pushes useful context out of the model’s window, and becomes hard to verify. The solution is a lightweight entry file that only points agents to the appropriate deeper docs.

Spec drift is inevitable . Code evolves faster than manual spec updates, leading to silent mismatches where an “Active” spec no longer reflects the actual system. OpenAI mitigates this by running a periodic “doc‑gardening” agent that scans for drift and opens PRs. A proactive spec‑code consistency checker that compares interfaces, schemas, and configs would be a valuable next step.

Knowledge assetization . Specs turn fleeting design discussions and cross‑service decisions into versioned, searchable assets. Even when people or agents change, the spec remains as the authoritative memory.

06 Conclusion

Harness acts as an amplifier: the stronger the harness, the more impact the spec has on the final product. A well‑crafted spec becomes the navigation system that prevents agents from veering off course, turning speculative AI coding into reliable, maintainable software development.

References:

Mitchell Hashimoto, “My AI Adoption Journey”, Feb 2026 – https://mitchellh.com/writing/my-ai-adoption-journey

OpenAI Engineering Team, “Harness Engineering”, Feb 2026 – https://openai.com/index/harness-engineering/

AI agent reliability Spec‑Driven Development Harness Engineering

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.