Taming AI Code Generation with PDCA: From Prompt to Reliable Delivery
This article explains how applying the classic PDCA (Plan‑Do‑Check‑Act) loop and a Harness engineering layer can transform probabilistic AI code generators like Codex and Claude Code into deterministic, reliable delivery tools for software development, documentation, and automated testing.
AI programming tools such as Codex and Claude Code can generate code, tests, and documentation in seconds, but their output is probabilistic and often unreliable for production use. The article argues that the solution is not better prompts but a disciplined engineering process based on the PDCA (Plan‑Do‑Check‑Act) cycle, combined with a Harness system that makes the loop executable, verifiable, and repeatable.
1. Why PDCA Matters for AI Development
Traditional software development follows deterministic logic, while large language models predict the next token, leading to three main problems: (1) they fill missing context with plausible but incorrect information, (2) they may produce locally correct code that breaks system-wide constraints, and (3) they forget earlier constraints in multi‑step tasks, resulting in "engineering hallucinations" such as claiming tests passed when they never ran.
Therefore, AI‑assisted work must be treated as a controlled process rather than a one‑off generation.
2. Mapping PDCA to AI Workflows
The PDCA loop is described as a continuous cycle: Plan → Do → Check → Act → next Plan Each stage is re‑interpreted for AI:
Plan : Define the goal, scope, constraints, inputs, outputs, acceptance criteria, risks, and stop‑points. Example prompt (translated):
Goal: Add duplicate‑submission protection to order creation (same user, same parameters within 5 seconds should succeed only once; subsequent attempts return a clear error code).
Scope: Modify only files under internal/order; do not change DB schema or add external dependencies.
Output: Unit tests, implementation code, error‑handling description.
Acceptance: <code>go test ./internal/order/...</code> passes; new tests cover duplicate‑submission scenario.
Risks: Concurrency safety, cache cleanup.
Stop‑point: If the current architecture cannot support a 5‑second window, output a design proposal instead of hard‑coding.Do : Execute the plan in small, verifiable steps. For code, start with failing tests (TDD); for documentation, write one section at a time; for UI testing, generate a test plan before code.
Check : Verify results with concrete evidence—run tests, lint, type‑check, diff, UI screenshots, or fact tables. Do not accept the AI’s claim of completion; require observable proof.
Act : Consolidate the experience—turn stable prompts into templates, capture checklist items, update knowledge bases, and add regression tests so the next iteration starts from a higher baseline.
3. Harness Engineering – The Execution Layer
Harness is described as an external control system that does not make the model smarter but makes its execution reliable. A minimal Harness includes:
Task specification (goal, boundaries, acceptance).
Context management (which code, docs, rules, history are fed to the model).
Tool integration (CLI, browsers, databases, Feishu, MCP, etc.).
Permission control (what actions are allowed, when to stop).
Orchestration (plan, checkpoints, sub‑agents, retries, hand‑offs).
Feedback verification (tests, lint, type‑check, reviews, browser checks).
Knowledge memory (KB, history, insights).
Observability and governance (logs, reports, audit trails, finalizers).
With Harness, the workflow changes from a chaotic "human‑prompt‑AI‑human‑check" loop to a disciplined "Plan → AI execution → tool verification → feedback → improvement" pipeline.
4. Practical Example 1 – Go Backend Feature with Claude Code + TDD
The task is to add duplicate‑submission protection to an order‑creation API. The process follows the PDCA‑TDD template:
Plan : Ask Claude to output the files to read, current workflow, test design, minimal implementation plan, concurrency risks, and open questions—without writing code.
Do‑1 : Write failing unit tests covering five scenarios (first submission succeeds, duplicate within 5 seconds fails, different users, different parameters, after 5 seconds succeeds).
Check‑1 : Run go test ./internal/order/... and confirm the tests fail (ensuring the test actually checks the requirement).
Do‑2 : Implement the minimal code (e.g., a thread‑safe map with TTL) while respecting the scope.
Check‑2 : Verify tests pass, diff is limited to internal/order, review for concurrency safety, cache cleanup, error‑code consistency, and ensure no unrelated code was changed.
Act : Extract a reusable Go TDD prompt template, a backend feature checklist, concurrency‑review list, error‑code conventions, and unit‑test samples.
5. Practical Example 2 – Technical Documentation with Codex
The goal is a design doc comparing two internal platforms. The PDCA steps are:
Plan : Generate a document outline, core points, fact list, and a list of uncertain items.
Do : Write each section (Why, definitions, problem boundaries, solution relationship) separately, limiting each to ~800 characters.
Check : Extract a fact table (fact, source, confirmed?, risk) and run role‑based reviews (tech lead, front‑end, back‑end, tester, PM) to ensure completeness and correctness.
Act : Produce a reusable documentation template (Why → Definitions → Boundaries → Solution → Risks → Roadmap) and a terminology glossary.
6. Practical Example 3 – Page Automation Testing with Codex
A Playwright test suite for an admin order page is built using PDCA:
Plan : List required mock APIs, test steps, assertions, and risk points.
Do : Generate Playwright test code using stable selectors (role/text/test‑id) and stable mocks.
Check : Run npx playwright test admin-orders.spec.ts, analyze failure reports and screenshots, differentiate between test‑code errors, mock issues, UI bugs, or environment problems, and suggest minimal fixes.
Act : Create a page‑test plan template, selector guidelines, mock data conventions, failure‑analysis prompt, and a coverage checklist (loading, empty, error, success).
7. Common Pitfalls and Safeguards
Never let the model implement a task without an explicit Plan that defines goal, scope, and acceptance.
Avoid large, one‑shot changes; break work into small checkpoints.
Never trust the AI’s "completed" claim—require test results, diffs, and evidence.
Use independent reviews for high‑risk tasks; AI can assist but not replace human validation.
Define stop‑rules (e.g., two consecutive failed repair attempts trigger manual intervention).
Always capture the experience (templates, checklists, skills) so future work does not start from scratch.
8. Getting Started
Three minimal actions can immediately shift AI work from "free generation" to "controlled delivery":
Require a Plan before any implementation (goal, scope, acceptance, risks, stop‑points).
Enforce a Check gate that demands concrete evidence (tests, lint, screenshots, fact tables).
After three similar successful tasks, distill the workflow into a prompt template, checklist, script, or team guideline.
These steps create a repeatable loop that turns AI from a clever chatbot into a reliable development partner.
9. Conclusion
AI excels at fast generation but lacks self‑control. By embedding the PDCA loop and a Harness execution layer, teams can achieve deterministic, auditable AI‑assisted development. The real value of AI lies not in better prompts but in better processes that let humans focus on goals, boundaries, and quality while the model handles repetitive, well‑defined steps.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Nightwalker Tech
[Nightwalker Tech] is the tech sharing channel of "Nightwalker", focusing on AI and large model technologies, internet architecture design, high‑performance networking, and server‑side development (Golang, Python, Rust, PHP, C/C++).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
