Artificial Intelligence 30 min read

Inside Claude Code: How a 500k‑Line AI Programming Tool Leaked and What Its Architecture Reveals

The Claude Code source leak exposed over 500,000 lines of AI‑coding tool code, revealing its npm publishing mishap, the layered architecture built on React Ink, the ReAct‑style agent loop, sophisticated tool orchestration, multi‑tier memory management, context compression, security checks, feature flags, and even anti‑distillation defenses.

macrozheng

Apr 10, 2026

Inside Claude Code: How a 500k‑Line AI Programming Tool Leaked and What Its Architecture Reveals

Leak Origin

Claude Code is distributed via npm. During the build Bun automatically emits a source‑map file ( cli.js.map) that contains two parallel arrays: sources (file paths) and sourcesContent (full source code). The release pipeline failed to exclude .map files, so a 59.8 MB map was published to the public npm registry. By parsing the JSON and writing each entry to its original path the entire client codebase (≈1906 TypeScript files, 512 k lines) can be reconstructed.

Architecture Overview (Six Layers)

CLI & UI layer – renders the terminal interface (React Ink).

Agent loop – the reasoning core that drives tool usage.

Tool system – 40+ built‑in tools plus MCP extensions.

Memory system – three‑tier storage for hot, warm and cold data.

Context‑compression – progressive token‑reduction.

Permissions & security – fine‑grained safety checks.

1️⃣ Agent Loop (ReAct)

The core loop lives in src/query.ts. It is an async generator that repeatedly:

// simplified queryLoop
async function* queryLoop(params) {
  // 1. Context compression
  // 2. Call LLM (streaming)
  // 3. Parse tool_use response
  // 4. Execute tool & get result
  // 5. Append result to history
  // 6. Repeat until no tool_use
}

This implements the ReAct pattern (think → act → observe → think again). Model calls are performed by src/services/api/claude.ts::queryModel, which streams responses and tracks token usage. When the response contains a tool_use block the loop continues; otherwise it terminates.

2️⃣ Tool Design

All tools are registered in src/tools.ts. The function getAllBaseTools() returns an array of tool classes (e.g., FileReadTool, BashTool, WebFetchTool). A comment warns that this list must stay in sync with the A/B‑testing config because the system‑prompt cache depends on it.

Each tool is instantiated via src/Tool.ts::buildTool. The factory supplies default safety flags:

const TOOL_DEFAULTS = {
  isEnabled: () => true,
  isConcurrencySafe: () => false,
  isReadOnly: () => false,
  isDestructive: () => false,
  toAutoClassifierInput: () => ''
};

Defaults follow a “fail‑closed” model: if a developer forgets to mark a tool as read‑only or concurrency‑safe it is treated as potentially dangerous.

3️⃣ Concurrency & Read‑Write Separation

Tool orchestration lives in src/services/tools/toolOrchestration.ts. The maximum number of concurrent tool executions defaults to 10 but can be overridden with the environment variable CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY:

function getMaxToolUseConcurrency() {
  return parseInt(process.env.CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY || '', 10) || 10;
}

Tools are partitioned into batches. Read‑only tools may run in parallel; any tool flagged as write forces a queue, mirroring classic DB read‑write separation. After a batch finishes, context modifiers are applied sequentially to guarantee deterministic state updates.

4️⃣ System Prompt Caching

Anthropic’s API supports prompt‑cache. The static part of the system prompt is separated by the marker __SYSTEM_PROMPT_DYNAMIC_BOUNDARY__. The static section is shared across users and cached, while the dynamic section (e.g., current time, git status, user‑specific configuration) is generated per request.

export const SYSTEM_PROMPT_DYNAMIC_BOUNDARY = '__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__';
const sections = [
  // static blocks …
  ...(shouldUseGlobalCacheScope() ? [SYSTEM_PROMPT_DYNAMIC_BOUNDARY] : []),
  // dynamic blocks …
];

5️⃣ Retrieval Strategy (No RAG)

Claude Code does not use a vector‑database RAG. Instead it performs plain grep searches over the memory directory and historic JSONL logs. The function src/memdir/memdir.ts::buildSearchingPastContextSection builds shell commands such as:

const memSearch = `grep -rn "<search term>" ${autoMemDir} --include="*.md"`;
const transcriptSearch = `grep -rn "<search term>" ${projectDir} --include="*.jsonl"`;

6️⃣ Three‑Tier Memory Architecture

Hot memory (MEMORY.md) – a ~200‑line, ≤25 KB index loaded into every prompt.

Warm memory (topic files) – on‑demand files such as user_role.md (up to 5 per session).

Cold memory (historical .jsonl logs) – searched with grep when needed.

Truncation logic first cuts by line count, then by byte size, and finally appends a warning so the model knows the index is incomplete:

export const MAX_ENTRYPOINT_LINES = 200;
export const MAX_ENTRYPOINT_BYTES = 25000;
function truncateEntrypointContent(raw) {
  // line‑based cut → byte‑based cut → add warning
}

7️⃣ Five‑Level Context Compression

Snip – drop tool‑call payload, keep only structure.

Micro‑compact – move large results to an external cache.

Context Collapse – summarize middle dialogue.

Auto‑compact – trigger full‑prompt summarisation when a token threshold is crossed.

Reactive Compact – emergency compression on API 413 errors.

Modules are lazily required based on feature flags:

const reactiveCompact = feature('REACTIVE_COMPACT')
  ? require('./services/compact/reactiveCompact.js')
  : null;
const contextCollapse = feature('CONTEXT_COLLAPSE')
  ? require('./services/contextCollapse/index.js')
  : null;

An auto‑compact circuit breaker stops after three consecutive failures to avoid runaway API calls.

const MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3;

8️⃣ Security & Permissions

Even in “YOLO” mode ( --dangerously-skip-permissions) actions pass through a shadow classifier ( src/utils/permissions/yoloClassifier.ts) that returns allow, soft_deny or hard_deny. Bash commands undergo 23 distinct checks (Unicode whitespace, Zsh dangerous commands, IFS injection, etc.) to prevent command‑injection attacks:

const BASH_SECURITY_CHECK_IDS = {
  INCOMPLETE_COMMANDS: 1,
  JQ_SYSTEM_FUNCTION: 2,
  // …
  UNICODE_WHITESPACE: 18,
  ZSH_DANGEROUS_COMMANDS: 20,
  // total 23 checks
};

Permission evaluation aggregates results from runtime mode, user hooks, the YOLO classifier, Bash security checks and a rule engine, always taking the most restrictive outcome.

9️⃣ Feature Flags & Roadmap

Experimental capabilities are guarded by feature('NAME') checks. Examples include:

KAIROS – a 24‑hour “assistant” mode with auto‑dreaming.

COORDINATOR_MODE – multi‑agent collaboration (research → synthesis → implementation → verification).

VOICE_MODE – speech input/output.

WEB_BROWSER_TOOL – browser automation.

Feature‑flagged modules are conditionally require d, allowing tree‑shaking and gray‑release without code roll‑backs.

🔟 Anti‑Distillation & Undercover Mode

To poison data‑collection attempts, the CLI injects a fake fake_tools payload when the flag ANTI_DISTILLATION_CC is enabled:

if (feature('ANTI_DISTILLATION_CC') && process.env.CLAUDE_CODE_ENTRYPOINT === 'cli') {
  result.anti_distillation = ['fake_tools'];
}

When Anthropic engineers contribute to public repositories, src/utils/undercover.ts strips internal model codenames, project names and other identifiers. The mode cannot be forced off, ensuring no internal information leaks.

Additional Engineering Details

Dynamic imports ( await import()) keep the CLI fast; --version returns immediately without loading the rest of the bundle.

An early‑input capture module ( src/utils/earlyInput.ts) buffers keystrokes while the main bundle loads, replaying them after initialization.

A TCP/TLS pre‑connect ( src/utils/apiPreconnect.ts) runs in parallel with startup to shave ~100 ms off the first API call.

Overall, the leak reveals that Claude Code’s strength lies not in novel algorithms but in meticulous engineering: a ReAct‑style agent loop, robust tool orchestration with read‑write separation, layered memory, aggressive context compression, and defense‑in‑depth security. The source code serves as a concrete blueprint for building high‑quality AI‑augmented developer tools.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

memory management AI agents source-code-analysis feature flags Claude Code context compression tool architecture

Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.