Inside Claude Code: How Anthropic’s Programming Agent Handles Architecture, Memory, and Context

This article provides a detailed technical walkthrough of Claude Code, Anthropic’s AI programming agent, covering its core architecture, the four‑layer engine design, the shift from ReAct to a streamlined Tool‑Use Loop, sophisticated system prompts, a structured memory subsystem, and a five‑step context compression strategy that keeps the model within token limits while preserving essential information.

IT Services Circle
IT Services Circle
IT Services Circle
Inside Claude Code: How Anthropic’s Programming Agent Handles Architecture, Memory, and Context

1. What Is Claude Code?

Claude Code is Anthropic’s programming agent that runs directly in the terminal, capable of reading, editing, and executing code, as well as managing Git operations. It is an AI Agent rather than a simple chatbot.

2. Architecture Design

The system follows a four‑layer architecture:

Engine Layer : The brain that coordinates user input, system instructions, and model responses. It never contains business logic.

Tool Layer : Over 40 tools (file read/write, shell execution, search, etc.) each with strict type‑checked safety attributes (read‑only, destructive, concurrent).

Service Layer : Shared infrastructure such as the large‑model API, context compression, and the MCP protocol.

Safety & Governance Layer : A global safety net that enforces permission checks, hook insertion, and Bash security analysis.

3. Agent Working Modes

Tool‑Use Loop

Claude Code replaces the classic ReAct "Thought‑Action‑Observation" cycle with a simple while(true) loop that lets the model decide internally whether to tool_use or end_turn. This eliminates token waste from explicit thoughts and relies on the model’s internal "Extended Thinking".

Plan Mode

For complex tasks, the agent first enters a read‑only planning phase using the EnterPlanMode tool, writes a plan to .claude/plans/, and after user approval executes the plan with ExitPlanMode. This separates exploration from execution.

4. System Prompt Construction

The prompt is built from many static sections (role definition, safety redlines, behavior rules, tool usage, Git safety, output style) that are identical for every user, followed by a dynamic boundary __SYSTEM_PROMPT_DYNAMIC_BOUNDARY__ that injects per‑user data such as the working directory, model version, and project‑specific files. The static part can be cached globally, reducing API cost by up to 90%.

5. Memory System

Claude Code stores only four explicit memory types: user , feedback , project , and reference . It excludes any information that can be derived from the current code base (e.g., file structure, Git history). Each memory entry is a separate .md file with a YAML header ( name, description, type) and a MEMORY.md index limited to 200 lines. Retrieval works in three steps:

Scan the first 30 lines of every memory file to collect headers.

Send the compact list to a lightweight model (Sonnet) that returns the most relevant filenames.

Load the selected files’ full content and inject them as system reminders.

Memories older than one day receive a freshness warning, and large tool results are persisted to disk with a short preview kept in the message stream.

6. Context Window Management

Claude Code employs a five‑step compression pipeline that escalates only when needed:

Large Result Persistence : Tool outputs larger than ~50 KB are saved to disk; the message keeps a 2 KB preview.

Snip (Old Message Removal) : Very old conversation turns are dropped and replaced with a boundary marker, freeing tokens without extra API calls.

Micro‑Compact (Tool Output Truncation) : Results from repeatable tools (read, shell, grep, etc.) are trimmed, keeping only the most recent N entries.

Context Collapse (Read‑Time Projection) : Before each API call, a read‑time view compresses messages to stay under 90 % of the model’s window, inserting a boundary token if needed.

Auto‑Compact (Full Summarization) : When the window exceeds ~93 %, the system generates a structured summary via the model, replaces the old context with the summary, and then restores the most recent files and active skills (up to 5 files, 50 KB total) so the agent can continue without re‑reading.

This tiered approach ensures minimal information loss while keeping token usage low.

7. Final Thoughts

The deep dive shows that Claude Code’s strength lies not in raw model size but in the surrounding engineering: clear safety boundaries, modular tool design, disciplined memory handling, and a progressive context‑compression strategy. These “invisible” components act as the brakes, steering wheel, and seatbelts that make a powerful AI agent safe and efficient in real‑world programming tasks.

Claude CodeMemory Systemtool-use loop
IT Services Circle
Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.