From Code Generation to Harnessing Non‑Determinism: Martin Fowler’s AI Development Insight
Martin Fowler argues that the biggest shift in AI‑driven software development is no longer about making models write code, but about integrating the inherent non‑determinism of AI into a verifiable, rollback‑capable engineering pipeline—what he calls Harness engineering—to preserve reliability and governance.
After re‑listening to Martin Fowler’s two‑hour interview on The Pragmatic Engineer and reviewing his recent essays, the author highlights a single, powerful insight: software engineering for decades has relied on a deterministic machine, and now a nondeterministic collaborator—an AI model—has entered the development pipeline.
From Code Generation to Engineering Nondeterminism
Fowler stresses that the most significant change is not that AI will write more code, but that the focus is shifting to packaging the nondeterministic behavior of AI into a verifiable, rollback‑ready, and governable engineering system . This reframes many buzzwords—Vibe Coding, Agentic Engineering, Context Engineering, Harness Engineering, Subagents, Skills, Agent consoles—into a single problem: how does a development system digest a nondeterministic collaborator that can read repositories, modify files, run tests, open PRs, and inspect logs?
Vibe Coding’s Boundary
Effective for prototypes, one‑off tools, and short‑lived scripts, but when turned into a long‑term asset it introduces challenges around learning loops, code ownership, and system understandability.
Traditional practices—TDD, refactoring, CI, static analysis—remain vital; the faster AI generates code, the more valuable deterministic feedback becomes.
"Harness" is not just a marketing term; it is the adaptation layer that carries context, tools, permissions, testing, observability, and garbage collection.
Agentic Engineering shifts the goal from merely automating tasks to preserving human‑defined goals, boundaries, verification, and experience.
Teams should start slowly: instead of building a fully automated AI team, first address six modest tasks—small slices, strong verification, repository knowledge, permission boundaries, error classification, continuous cleanup.
Why the Analogy to Assembly‑to‑Fortran Is Incomplete
Moving from assembly to high‑level languages abstracted away hardware details but still ran on a deterministic machine. AI adds a probabilistic layer: the same goal can be achieved via different paths, explanations may be plausible yet unverified, and a single change can unintentionally modify many surrounding pieces.
Learning Loop and Vibe Coding
Fowler warns that if AI‑generated changes are not reviewed, understood, or refactored, the learning loop—code → feedback → system understanding → design correction—breaks. Short‑term speed gains can lead to long‑term fragility where the system "runs but no one truly understands".
Testing, Refactoring, and Small Slices
Because AI can change many files at once, the author recommends thin slices: understand one logic segment, modify a single boundary, run tests, type checks, lint, and let deterministic refactoring tools handle the rest. This limits the radius of AI‑driven divergence and keeps review manageable.
Harness as the Nondeterminism Adaptation Layer
According to LangChain’s "The Anatomy of an Agent Harness", an agent equals Model + Harness. Harness incorporates file systems, tools, sandboxes, state, sub‑agents, hooks, verification, and long‑task control. OpenAI’s "Harness Engineering" adds engineering‑level concerns: planning assets, documentation in version‑controlled repositories, lint‑enforced architecture rules, incremental technical debt cleanup, and encoding human preferences into the repo.
Mitchell Hashimoto’s "My AI Adoption Journey" illustrates the practical step of "Engineer the Harness": after a model error, capture the fix as a rule (e.g., AGENTS.md), a script, or an approval policy, turning a one‑off correction into a reusable safeguard.
Safety and the Lethal Trifecta
Model mistakes can appear correct—apologizing, providing plausible explanations, or claiming tests passed. Fowler recounts an example where a model wrote the wrong date in a config comment, apologized, and then wrote an even older date, effectively gaslighting the developer.
Therefore, safety must come from external feedback: test results, type correctness, dependency boundaries, approval logs, runtime metrics, and structured error traces. Simon Willison’s "lethal trifecta" (private data, untrusted content, outbound communication) highlights the need for isolation, auditability, and structured failure handling.
Engineering Role Evolution
Fowler cites a study of 158 engineers describing a new "supervisory engineering work" layer: defining tasks, organizing context, supervising agents, evaluating output, converting errors into rules, and feeding experience back into the system. This sits between the traditional inner loop (code, test, debug) and outer loop (review, CI/CD, release).
Examples from Google, Karpathy, Cursor, and Boris Cherny show that as agents become more capable, engineers shift from writing code to managing goals, boundaries, verification, and system evolution.
Six Practical Actions to Start Building a Harness
Slice tasks small : avoid giving an agent a whole module to refactor; start with independent, verifiable tasks such as adding a test, fixing a clear bug, or replacing a deterministic API.
Put knowledge back into the repository : capture design decisions, constraints, and past pitfalls as docs, ADRs, or rule files so the agent can retrieve the same context humans use.
Run validation first : ensure critical paths have tests, add type or lint checks, and block high‑risk dependency changes until safeguards exist.
Layer permissions by risk : differentiate low‑risk actions (auto‑approved) from medium‑risk (require confirmation) and high‑risk (require approval, logging, and rollback).
Classify errors : replace generic messages like tool failed with specific categories (parameter error, environment error, permission error, timeout, vendor error, user abort, test failure, verification failure).
Write experience back into the Harness : after fixing an agent error, add a test, lint rule, task template, documentation entry, parameter validator, or approval policy to prevent recurrence.
Conclusion
Fowler’s cautionary tone reminds us that AI’s impact on software engineering is still early. The real work is not to replace engineers but to embed engineering experience into the system so that a nondeterministic collaborator can operate safely within deterministic boundaries.
In short, the AI development agenda this year is moving from making models better at writing code to wrapping nondeterminism in a verifiable, rollback‑ready, and governable engineering stack .
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
