Harnessing AI Agents: Turning Probabilistic Output into Deterministic Engineering
The article analyzes how to bridge the inherent probabilistic nature of large language model agents with the absolute determinism required by engineering systems by building a harness that compresses solution space, enforces strict rules, and reshapes organizational workflows for reliable AI‑native development.
Problem
Large language models generate output probabilistically, which conflicts with the absolute certainty required by production software. To obtain large‑scale, maintainable, and trustworthy code, a surrounding harness must be built.
What a Harness Is
A harness converts uncertainty into certainty by abandoning unrestricted generation and instead applying detailed prompts, explicit rules, and engineered guardrails that filter and align model output.
Context Engineering
Effective context engineering intertwines static and dynamic knowledge.
Static knowledge base documents domain models, API contracts, and architectural decisions; it forms the baseline context for agents.
Dynamic context injects real‑time observability data, test‑coverage reports, and even browser navigation state into the agent’s workflow. Without it, an agent produces syntactically correct but logically detached code.
Embedding all specifications in a single .cursorrules file overloads the model’s context. The Trellis framework solves this by modularising specifications under a spec/ directory (e.g., database‑guidelines.md) and loading only the relevant modules at runtime via JSONL task files in tasks/.
Architectural Constraints
Deterministic architectural constraints act as a defensive line. Custom static analysis tools and AST‑based checkers intercept non‑compliant dependency calls—e.g., an agent attempting a direct database connection from the UI layer—and return a structured error stack with remediation guidance. This creates a self‑healing loop where the agent receives precise feedback and must generate a corrected submission.
Code Entropy and Garbage Collection
Fully autonomous agents reproduce existing patterns, including legacy design flaws, leading to architectural drift. Manual weekly cleanup of “AI residue” consumed ~20 % of developer time and proved unsustainable.
Solution: senior engineers encode subjective “golden principles” as mechanical rules (e.g., mandatory shared utility packages, prohibition of ad‑hoc helper scripts, strict SDK type usage). Background agents run periodically, scan the repository for deviations, and open targeted refactor pull requests that can be reviewed and merged within a minute, analogous to memory‑management garbage collection.
Feedback Loop
To prevent the model’s attention from being diluted by noisy logs, the harness enforces a zero‑output principle: passing tests produce no output; failing tests emit only a concise error stack and failure reason. Before an agent marks a task as completed, the system forces an acceptance checklist that requires the agent to verify each requirement against the documented boundaries. A retry threshold aborts repeated modifications to the same file when tests continue to fail, forcing a rollback and manual re‑examination.
Organizational Harness
Conway’s Law implies that system design mirrors communication structure. AI blurs traditional boundaries, so roles shift:
Product managers can generate runnable front‑end prototypes.
Designers embed interaction constraints directly into component specifications.
Engineers focus on state management, error handling, and maintainability.
Responsibility is re‑assigned from “who wrote the line” to “who defined the system” across three layers:
Requirement responsibility : defines goals and acceptance criteria.
Architectural responsibility : defines boundaries, patterns, and enforces consistency.
Release responsibility : decides production deployment and assumes risk.
AI‑Native Development Workflow
Requirement generation : PM delivers a runnable prototype with explicit acceptance criteria, clarifying intent before engineering decisions.
Joint review : cross‑functional teams (design, front‑end, back‑end, algorithms) evaluate user paths, data flow, state boundaries, existing services, needed abstractions, risk points, and automation of acceptance. The review output is stored in the repository as structured constraints for the agent.
Engineering consolidation : engineers refactor the prototype into production components, align state management, add type checks, monitoring, integrate real APIs, and write regression tests, while handling cross‑module impact.
Automated verification and gradual rollout : dedicated teams run automated tests, perform canary releases, and feed feedback into the harness for continuous improvement.
Key Challenges
The three decisive factors are:
Environment : defines what the agent can see and act upon.
Feedback loop : determines whether errors are amplified or absorbed as improvement signals.
Control system : decides whether scaling generation stabilises or destabilises the organisation.
Neglecting any of these turns a powerful model into a rapid source of problems; addressing them enables stable, long‑term evolution even with modest model capability.
References
Martin Fowler’s blog cites a Thoughtworks article that categorises the harness into context engineering, architectural constraints, and feedback loops. The Trellis and Cursor frameworks are concrete implementations of the described patterns.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture and Beyond
Focused on AIGC SaaS technical architecture and tech team management, sharing insights on architecture, development efficiency, team leadership, startup technology choices, large‑scale website design, and high‑performance, highly‑available, scalable solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
