Artificial Intelligence 20 min read

Designing a Claude Code Harness for Production‑Grade Java Microservices

The article presents a detailed, production‑focused harness for Claude Code that structures prompts, rules, skills, and external hooks to compensate for LLM shortcomings in Java microservice development, preventing hallucinations, concurrency bugs, and false completions while ensuring reliable code delivery.

Linyb Geek Road

May 25, 2026

Designing a Claude Code Harness for Production‑Grade Java Microservices

Why a Harness Is Needed

LLM‑based code generation produces the most probable token sequence rather than a provably correct program; this is acceptable for demos but can introduce bugs in production. In a seven‑module Java microservice (DDD layers, distributed locks, MQ idempotency, cascading callbacks), the code agent must avoid concurrency issues, transaction boundary violations, and API incompatibilities.

Harness Goal and Co‑Creation

The harness uses the project’s engineering structure to compensate for the LLM’s cognitive gaps. Claude Code’s /init command auto‑generates a CLAUDE.md draft; community plugins (Superpowers, PUA) provide mature skill methodologies. The author acts as an architect, defining structure, reviewing output, and guiding the agent’s self‑iteration.

Six Predictable LLM Failure Modes

Hallucinated implementation : writes new methods or invents API signatures without reading existing code, leading to duplicated code.

Premature coding : starts implementing a service from a one‑sentence requirement, causing high rework.

Concurrency blind spot : generates SELECT → if not exist → INSERT patterns that throw DuplicateKeyException under load.

Scope creep : fixes a query bug and unintentionally refactors DTOs, breaking API compatibility.

False completion : claims tests passed without actually running them, resulting in CI failures.

Learned helplessness : defers to manual checks or attributes failures to the environment, wasting time.

Each layer of the harness is designed to block one or more of these patterns.

Four‑Layer Defense Architecture

┌───────────────────────────────────────────┐
│ Layer 4: Hook (external code, cannot bypass) │
├───────────────────────────────────────────┤
│ Layer 3: Rules (domain knowledge constraints) │
├───────────────────────────────────────────┤
│ Layer 2: Skill + Routing (methodology constraints) │
├───────────────────────────────────────────┤
│ Layer 1: PUA + Memory (motivation & correction) │
└───────────────────────────────────────────┘

Layers 1‑3 are prompt‑engineering constructs; Layer 4 is an immutable shell hook that the LLM cannot override.

CLAUDE.md – The System Prompt Extension

/init

scans pom.xml, module directories, and existing code style, producing a draft that covers 60‑70 % of project metadata. Engineers then enrich it with business invariants, concurrency constraints, delivery‑gate checks, and a skill routing table.

Core Template (excerpt)

# Global Rules
## Development Pre‑questions
1. What business invariant does this change affect?
2. What failure scenario and rollback path exist?
3. Which command or test proves the change works?
## Execution Order
1. Read relevant code, interfaces, SQL – no guessing.
2. Reuse existing implementations when possible.
3. Verify each logical unit immediately.
4. Modify only code within the request scope.
## Java Hard Constraints
- New public fields must remain backward compatible.
- Write paths need idempotent keys and concurrency protection.
- Critical branches must log entity ID, operation, result.
- External calls require explicit timeout and circuit‑breaker.
## Delivery Triple‑Gate
1. verification‑before‑completion (run command & paste output)
2. requesting‑code‑review
3. delivery‑gate DoD checklist

The “three questions” force the LLM to think before generating code; the ordered steps act as a hard guard against hallucination, duplicate effort, and scope creep.

Rules File Design

Only encode what the AI would get wrong; omit obvious Java knowledge. Example for data consistency:

## Update / Upsert Semantics
- Prefer <code>INSERT … ON DUPLICATE KEY UPDATE</code> for atomic writes.
- Prohibit “select‑then‑insert” patterns – they cause <code>DuplicateKeyException</code> under high concurrency.
- Append <code>id = LAST_INSERT_ID(id)</code> to UPDATE to avoid multi‑key errors.
- Use <code>COALESCE(VALUES(x), x)</code> for nullable fields.
- Batch reads/writes must be sharded by <code>_SIZE</code> to prevent slow queries.

These six rules prevented at least five concurrent‑write bugs in the author’s project.

Rules Splitting Strategy

data-consistency.md

– SQL, transactions, idempotency (loaded on write operations). clean-code-architecture.md – dependency direction, SOLID, function design (loaded on new classes/methods). java-code-style.md – collection, concurrency, naming traps (loaded on Java edits). java-runtime.md – JVM, thread‑pool, connection‑pool baselines (loaded on infra changes). delivery-gate.md – pre‑delivery verification checklist (loaded on completion claims).

Splitting reduces context‑window decay and improves hit rates.

Skill Routing – From Intent to Methodology

Skills encode process discipline; each addresses a specific failure mode. brainstorming: 9‑step design flow with a hard gate until human approval (prevents premature coding). writing-plans: converts design docs into executable plans (prevents unclear implementation paths). test-driven-development: enforces red‑green‑refactor; no code without passing tests (prevents false completion). systematic-debugging: 4‑phase root‑cause analysis, forces re‑evaluation after three failures (prevents learned helplessness). verification‑before‑completion: requires fresh verification evidence before claiming done (blocks over‑confidence). requesting‑code‑review: dispatches an independent sub‑agent for review (avoids self‑review blind spots). dispatching‑parallel‑agents: runs sub‑tasks in parallel to avoid context pollution (improves efficiency).

The PUA plugin adds behavior pressure: method routing, failure escalation, three red lines, and owner awareness, ensuring the LLM “wants” to do the right thing while Superpowers tells it “how” to do it.

Hook Layer – The Unbypassable Safety Gate

External shell scripts intercept dangerous commands. Example pre-tool-safety-check.sh blocks rm -rf, DROP TABLE, git push --force, etc., and returns a JSON decision. This layer acts as a constitutional safeguard independent of LLM compliance.

Memory Layer – Persistent Human Feedback

When engineers correct the LLM, the feedback is stored as a memory entry (e.g., “code‑quality zero tolerance”). During inference the LLM consults this file, receiving a training signal without retraining the model.

Practical Upsert Walk‑through

Without the harness, the LLM generates naïve upsert code that works in single‑thread tests but throws DuplicateKeyException at QPS > 10. With the harness, the process follows:

Layer 1 (PUA) triggers owner awareness about concurrency.

Layer 2 (brainstorming) produces a design doc that identifies the upsert scenario.

Layer 3 (rules) injects the upsert rule, leading the LLM to generate atomic INSERT … ON DUPLICATE KEY UPDATE SQL.

Layer 2 (TDD) requires a concurrent test with two threads.

Layer 2 (verification) runs the test and captures output.

Layer 4 (hook) checks the delivery‑gate checklist, confirming idempotency.

The result is a thread‑safe, fully tested upsert implementation on the first attempt.

Adoption Roadmap

Day 1 : run /init to generate CLAUDE.md, write pre-tool-safety-check.sh, start using the agent.

Week 1 : capture each agent mistake as natural‑language feedback, convert it into a rule file, enrich CLAUDE.md with the three questions, execution order, and top hard constraints.

Month 1 : install Superpowers, enable brainstorming and verification‑before‑completion, add the skill routing table to CLAUDE.md.

Quarter 1 : complete the suite of skills (TDD, debugging, code‑review), split rules by domain, optionally add the behavior‑pressure layer.

Conclusion

The harness is not a sign of distrust in AI; it acknowledges that LLMs lack self‑reflection, doubt, and disciplined verification. By providing structured prompts, hard constraints, external safety hooks, and persistent memory, the harness gives the LLM the “habit of doubt, the discipline of verification, and the courage to admit ignorance” needed for reliable production code.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java code generation microservices LLM Prompt Engineering Software Engineering

Written by

Linyb Geek Road

Tech notes

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.