How Fitness Functions Can Define “Done” for AI‑Driven Software Development
The article explains how AI‑powered agents change software delivery, why traditional notions of task completion no longer apply, and how a Fitness Function‑based harness engineering approach—illustrated with the Routa project—encodes executable, auditable completion criteria, hard‑gate checks, and contract consistency to reliably guide agents through the development loop.
Why "completion" matters in AI‑driven development
When AI agents are integrated into the software delivery pipeline, the traditional team‑based intuition of when a task is finished is no longer reliable. An agent can generate code or fix an error instantly, but hidden problems such as missing contract updates, incomplete test coverage, or semantic drift may remain. To safely let agents participate, the notion of "done" must be expressed as explicit, machine‑readable signals that can be audited and enforced by CI pipelines.
Fitness Function as a completion condition
Originally introduced in evolutionary architecture, a Fitness Function continuously validates that a system satisfies a set of constraints. In the AI era it becomes a completion condition mechanism that enumerates the exact signals (tests, contract checks, security scans, etc.) that must appear before a task can be considered finished.
Routa fitness architecture
The open‑source Routa project ( https://github.com/phodal/routa) embeds the fitness rules directly in the repository so that they are versioned, discoverable by agents, and consumable by CI. The relevant directory layout is:
docs/fitness/README.md # Rule handbook
docs/fitness/unit-test.md # Test evidence
docs/fitness/api-contract.md # OpenAPI contract checks
docs/fitness/rust-api-test.md # API test matrix
docs/fitness/security.md # Security scan rules
docs/fitness/code-quality.md # Code‑quality rules
docs/fitness/scripts/… # Execution scriptsWhen a code change occurs, the AGENTS.md entry triggers the fitness checks, enforcing a "baby‑step" approach where each modification must satisfy the declared rules before the loop can exit.
Rule format: readable front‑matter
Rules are written as Markdown front‑matter, balancing human readability with machine executability. An example testability rule:
---
dimension: testability
weight: 14
threshold:
pass: 80
warn: 70
metrics:
- name: ts_test_pass
command: npm run test:run 2>&1
pattern: "Tests\s+(\d+)\s+passed"
hard_gate: true
---This file declares the metric name, the command to run, the output pattern to match, and whether the metric is a hard gate that blocks the pipeline on failure.
Execution engine
The fitness.py script scans all *.md files under docs/fitness, parses the front‑matter, runs the specified commands, captures output, and evaluates the result against the pattern or exit code. A simplified implementation:
def run_metric(metric: dict, dry_run: bool = False) -> tuple[str, bool, str]:
name = metric.get('name', 'unknown')
command = metric.get('command', '')
pattern = metric.get('pattern', '')
result = subprocess.run(["/bin/bash", "-lc", command],
capture_output=True, text=True, timeout=300)
output = result.stdout + result.stderr
if pattern:
passed = bool(re.search(pattern, output, re.IGNORECASE))
else:
passed = result.returncode == 0
return name, passed, outputThe engine returns the metric name, a boolean indicating pass/fail, and the combined output, allowing CI to block further steps when a hard gate fails.
Contract consistency and hard gates
Routa treats the OpenAPI specification as a single source of truth. Rules such as openapi_schema_valid and api_parity_check run npm run api commands and verify that the output matches success patterns. When hard_gate is true, a failure stops the pipeline, providing a definitive Definition of Done for agents.
Evidence files as an engineering ledger
Each rule has a corresponding evidence file that records verification status ( VERIFIED, TODO, BLOCKED). For example, an integration‑test evidence file may contain:
### Integration test (API behavior)
- notes: process flow
- status: `VERIFIED`
- required: create/list/get/delete success‑/‑failure loops
- evidence: `docs/fitness/rust-api-test.md`
- store: workspace
- status: `TODO`
- required: CRUD, query filtering, archive consistencyThese files act as a structured ledger that captures both human‑readable documentation and machine‑consumable verification data.
Hard Gate: the true "done" checkpoint
Hard gates are explicit fail‑fast conditions. If any hard‑gate metric fails, the CI pipeline aborts immediately rather than proceeding to a scoring phase. This mirrors the classic Definition of Done but is enforced automatically for AI agents.
Key takeaways
Explicit completion signals replace vague, experience‑based judgments.
Fitness rules live in the repository , making them discoverable by agents and versioned alongside code.
Front‑matter Markdown provides a readable yet executable format for rules.
Unified executor ( fitness.py ) centralizes rule interpretation, removing human ambiguity.
Contract checks prevent semantic drift in multi‑backend systems.
Hard gates act as definitive blockers, ensuring agents cannot exit the loop until all required conditions are met.
By codifying the definition of "finished" as a set of auditable fitness rules, teams can safely integrate AI agents into code generation, testing, and deployment while avoiding hidden regressions and semantic inconsistencies.
phodal
A prolific open-source contributor who constantly starts new projects. Passionate about sharing software development insights to help developers improve their KPIs. Currently active in IDEs, graphics engines, and compiler technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
