Artificial Intelligence 34 min read

From One LLM Call to Working Code: Inside Claude Code’s Agent Harness

This article dissects Claude Code’s open‑source leak, walking through each stage from user input to the agent delivering executable code, revealing how a single LLM invocation is wrapped by a meticulously engineered Agent Harness that manages context, tool permissions, concurrency, planning, and error recovery.

Tech Minimalism

Apr 8, 2026

From One LLM Call to Working Code: Inside Claude Code’s Agent Harness

The recent source‑code leak of Anthropic’s Claude Code provides a rare, production‑grade view of an AI‑powered programming agent. By following a single user message through the system, we can see how the seemingly trivial LLM call is surrounded by a sophisticated Agent Harness that turns it into a fully functional development assistant.

1. Context Assembly

When a message arrives, the engine builds a massive prompt consisting of:

System prompt generated by buildEffectiveSystemPrompt(), which stitches together environment info, tool usage rules, tone, and safety directives. Parts such as environment detection are produced by computeSimpleEnvInfo() and cached.

Project‑specific CLAUDE.md files loaded from a hierarchy of locations (system, user, project, local). The loading order is deterministic, e.g.,

/etc/claude-code/CLAUDE.md → ~/.claude/CLAUDE.md → ./CLAUDE.md → ./src/app/CLAUDE.md → ./src/app/.claude/CLAUDE.md

Persistent memory files, task lists, MCP server resources, skill listings, and the full conversation history.

The final prompt can exceed 15 000 tokens, so the harness later compresses it.

2. The Async Query Loop

The core driver is an async generator query() that runs an infinite while (true) loop. Each iteration performs eight steps:

Emit a stream_request_start event.

Control the size of tool output.

Optionally compress the previous response.

Append the final system prompt via appendSystemContext().

Send the request to Claude and stream the response.

Parse the streamed text and tool_use blocks.

Run any required tools.

Feed tool results back into the conversation and start the next round.

Because the response is streamed, the UI shows a “type‑writer” effect as tokens arrive.

3. Tool Invocation and Permission Layers

Claude can emit tool_use blocks such as:

{
  "type": "tool_use",
  "id": "toolu_123",
  "name": "FileEditTool",
  "input": {
    "file_path": "/Users/me/project/src/app.ts",
    "old_string": "const x = 1",
    "new_string": "const x = 2"
  }
}

Before execution the harness checks four layers:

Reject rules – immediate block (e.g., rm -rf).

Allow rules – fast‑track (e.g., Read, Glob).

Classifier – a 2‑second async heuristic that decides read‑only vs. risky commands.

Interactive prompt – asks the user when the classifier cannot decide.

These rules are defined in settings.json and can be overridden per session with --system-prompt.

4. Concurrency and Batching

Each tool implements an interface with isConcurrencySafe() and isReadOnly(). Before a batch runs, the harness calls partitionToolCalls() to group read‑only tools (executed concurrently, up to 10 at a time) and write‑only tools (executed serially). Example batch:

# Batch 1 (concurrent)
GrepTool("login handler")
GrepTool("auth middleware")
GlobTool("**/auth/**/*.ts")

# Batch 2 (serial)
FileEditTool("src/auth/login.ts")

# Batch 3 (concurrent)
FileReadTool("src/auth/test.ts")
FileReadTool("src/auth/types.ts")

Tool results are wrapped in a user message with a tool_result block, linking back to the original tool_use_id. This feedback loop lets Claude adjust its next actions based on concrete outcomes.

5. Context Management & Compression

After each round the harness checks token usage. When the remaining budget falls below ~13 000 tokens, three compression strategies fire:

Micro‑compression – trims verbose explanations while keeping tool inputs/outputs.

Conversation memory compression – replaces early dialogue with a CompactBoundaryMessage summary.

Reactive compression – on a prompt_too_long error the system performs a final summarisation and retries.

Failed auto‑compression after three consecutive attempts stops further attempts (see MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3).

6. Planning Mode and Tasks

For complex, multi‑step work the harness offers a Plan Mode . When entered, the system prompt switches to toolPermissionContext.mode = 'plan' and injects a read‑only checklist (e.g., “browse code, list similar patterns, propose a plan, await approval”). The model can then produce a structured plan without modifying files.

Once the user approves via ExitPlanMode, write permissions are restored and the plan is stored as a file that guides subsequent actions.

In parallel, a lightweight Task system tracks work items. Tools such as TaskCreateTool, TaskUpdateTool, and TaskListTool manipulate a JSON task list on disk. Dependencies are expressed via blocks and blockedBy, enabling the agent to avoid out‑of‑order modifications. Periodic reminders ( getTaskReminderAttachments()) inject the current task state into the prompt.

7. Sub‑Agents (Recursive Agents)

When a AgentTool is invoked, the harness spawns a new query() loop with its own message history and a restricted tool set. Isolation modes include:

Same CWD – shares the parent’s working directory.

Worktree – creates a separate git worktree so changes are sandboxed.

Background – runs asynchronously while the parent continues its own loop.

Sub‑agents cannot create unlimited grandchildren and inherit the same permission checks, keeping recursion bounded.

8. Termination & Error Recovery

The loop ends when needsFollowUp = false, returning {reason: 'completed'}. Other termination reasons include max_turns, user aborts, token limits, model errors, and server overload. For each case the harness attempts a specific recovery: increasing output limits, lightweight retries, or falling back to a secondary model.

Conclusion

Claude Code’s architecture demonstrates a full‑stack Agent Harness that:

Assembles a layered prompt from system rules, project files, memory, and skills.

Runs an async streaming loop that interleaves LLM output with tool execution.

Applies a multi‑layer permission system to keep dangerous commands in check.

Partitions tool calls for optimal concurrency.

Feeds tool results back into context, enabling closed‑loop reasoning.

Compresses context dynamically to stay within model limits.

Provides planning and task management primitives for large‑scale work.

Supports bounded recursive sub‑agents.

For anyone building AI agents, the harness – not the language model itself – is the critical piece that turns a single LLM call into reliable, multi‑step code generation.

LLM Context Management Claude Code Agent Harness Planning Mode Recursive Agents Tool Permissions

Written by

Tech Minimalism

Simplicity is the most beautiful expression of technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.