How CLAUDE.md Cut Claude Code Errors from 41% to 3% – What Really Changed?

The author reports a personal experiment where adding a concise CLAUDE.md file to 30 repositories reduced Claude Code's error rate from 41% to 3%, explains why such a tiny contract influences agent behavior, expands the original four Karpathy rules into twelve practical guidelines, and offers concrete advice on writing, structuring, and maintaining effective CLAUDE.md files.

Architect
Architect
Architect
How CLAUDE.md Cut Claude Code Errors from 41% to 3% – What Really Changed?

The author measured Claude Code’s error rate across 30 codebases over six weeks and observed a drop from 41% to 3% after introducing a CLAUDE.md file, but stresses that this is a personal experiment and not a universally applicable result. CLAUDE.md is not a mysterious prompt; it is a repository‑level behavior contract that records recurring engineering mistakes so the agent can read them before each task.

Karpathy’s original four rules form the core of the contract:

State assumptions before making non‑trivial changes.

Prefer the simplest solution.

Make only the necessary, surgical changes.

Loop on success criteria and verify outcomes.

These four rules were expanded to twelve concrete rules that address common failure modes such as silent assumptions, over‑complexity, unintended modifications, and unverified success:

Think Before Coding – declare assumptions.

Simplicity First – choose the minimal viable approach.

Surgical Changes – limit the change radius.

Goal‑Driven Execution – verify against success criteria.

Model for Judgment Only – keep deterministic logic in code.

Token Budgets – set limits for long tasks and pause for checkpoints.

Surface Conflicts – surface pattern conflicts instead of averaging.

Read Before Write – read surrounding code before editing.

Tests Verify Intent – tests must reflect business intent, not just return values.

Checkpoint Steps – insert checkpoints in multi‑step tasks.

Match Conventions – follow existing project conventions.

Fail Loud – explicitly report uncertainty, skips, or missing verification.

Real failure examples illustrate why the contract matters: a 503 retry decision left to the model caused token bloat; mixing two error‑handling patterns produced duplicated error handling; a new function silently shadowed an existing one because the agent missed the old implementation.

Good CLAUDE.md files avoid vague reminders and instead provide concrete, actionable commands. The article shows before/after snippets, turning generic advice like “be careful when running commands” into specific instructions such as “use pnpm test --filter <package> for package‑level tests”.

File size matters: keep the top‑level CLAUDE.md under 80‑120 lines to limit token consumption. Organise context in three layers – a short root file for high‑frequency contracts, project‑specific docs (e.g., docs/agent/project.md) for commands and boundaries, and task‑specific docs for migrations, API endpoints, UI components, etc.

A minimal template demonstrates the structure:

# CLAUDE.md

These instructions apply to this repository unless a more specific instruction overrides them.

## Working style
- State assumptions before making non‑trivial changes.
- Keep changes small and scoped to the request.
- Read the surrounding code before editing.
- Match existing conventions even when another style looks cleaner.
- If existing patterns conflict, choose the newer or better‑tested one and say why.
- Do not refactor adjacent code unless the task explicitly asks for it.
- If unsure, stop and ask instead of guessing.

## Verification
- Run the smallest relevant test first.
- If tests are skipped, say exactly which tests were skipped and why.
- Do not report success unless the requested behavior was verified.
- Prefer deterministic tools for formatting, linting, renaming, and validation.

Adoption steps are presented as a numbered checklist:

Copy a minimal template (e.g., Forrest Chang’s or the one above).

Trim it to ≤120 lines, keeping only rules that truly apply.

Add five project commands (install, type‑check, lint, unit test, integration test).

Add five safety boundaries (dangerous directories, read‑only files, prohibited commands, data dry‑run, test‑skip reporting).

Validate against a real small bug rather than a toy task.

Observe whether the agent asks fewer duplicate questions, makes fewer irrelevant changes, and reports verification more clearly.

Iterate only on rules that address actual failures; avoid adding rules that look nice but never get exercised.

The article warns that CLAUDE.md alone cannot enforce safety; deterministic tools, linting, type checking, tests, hooks, CI, and code review remain essential. It also notes that Anthropic’s documentation limits a single file to ~200 lines and that conflicting rules may be chosen arbitrarily by the model.

Finally, the author stresses that the most valuable outcome is turning yesterday’s agent mistakes into tomorrow’s readable contracts, treating the file as a living artifact that evolves with each incident.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AI agentsClaude CodeAgentic EngineeringCLAUDE.mdRule‑Based Guidance
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.