Artificial Intelligence 33 min read

How Claude Code’s Memory Mechanism Works: A Deep Dive into the Source Code

This article explains why LLMs are stateless, distinguishes short‑term from long‑term memory needs for agents, critiques common memory solutions, and then details Claude Code’s two‑layer architecture—static CLAUDE.md with six hierarchical files and a dynamic auto‑memory system that uses structured markdown, a lightweight selector model, and aging warnings—to provide a practical, source‑level blueprint for building robust agent memory.

IT Services Circle

Jun 6, 2026

How Claude Code’s Memory Mechanism Works: A Deep Dive into the Source Code

1. LLMs Have No Memory

LLMs are fundamentally stateless; each request re‑processes the system prompt, conversation history, and the new query. The illusion of memory comes from the client re‑sending the entire conversation each turn, limited by the context window.

For agents, this short‑term approach quickly fails because agents need to retain facts such as user profiles, preferences, project milestones, and external references across many interactions.

2. Why Common Memory Schemes Fall Short

Sliding Window : simply discards the oldest turns, which can drop essential facts (e.g., a user’s role) and mix irrelevant data.

Conversation Summarization : uses an LLM to condense history, but important details may be omitted and the summarization step adds token cost.

Vector Retrieval : stores embeddings in a vector DB, yet similarity does not guarantee relevance, results can be noisy, the system is costly to maintain, and the raw vectors are unreadable for debugging.

Hierarchical Storage : splits memory into hot and cold layers, but still relies on vector similarity and doubles the engineering effort.

All four share four hard‑coded problems: free‑form text without constraints, no type distinction, no aging mechanism, and heavy focus on retrieval while neglecting disciplined writing.

3. Claude Code’s Two‑Layer Memory Architecture

Claude Code solves the problem with a parallel static and dynamic layer.

Static Layer – CLAUDE.md Six‑Level System

The static layer is a set of six markdown files that encode deterministic project rules:

Managed : system‑wide policies (admin only).

User : per‑user global preferences.

Project : project‑specific rules committed to Git.

Local : untracked local overrides.

Auto : automatically generated MEMORY.md entries.

Team : shared team‑level auto‑memories (feature‑flagged).

Files can include other CLAUDE.md files via an @include directive, similar to C’s #include, avoiding duplication.

Conditional Rules

Rules can be scoped to file patterns (e.g., only load a front‑end style guide for *.tsx files) using a front‑matter paths field, ensuring token‑efficient, on‑demand injection.

Dynamic Layer – Automatic Memory System

The dynamic layer learns from interactions and writes back to disk as structured markdown files. Only four memory types are allowed, each with a strict schema:

user : immutable user profile facts.

feedback : behavioral preferences with a required Why and How to apply section.

project : time‑sensitive project facts, stored with absolute dates.

reference : external pointers (e.g., URLs, tickets).

Each memory file begins with a YAML front‑matter containing name, description, and type. All memories live under a single directory with a special MEMORY.md index that lists every entry’s name and description.

Writing Memories – The extractMemories Agent

After each conversation round, a background forked agent called extractMemories scans the user’s feedback, compares it against existing memories (using hasMemoryWritesSince to avoid duplicates), and writes new entries that match the four allowed types.

Retrieving Memories – Sonnet Selector

When a new query arrives, Claude Code does not perform vector similarity. Instead, it:

Reads the first 30 lines of every memory file to collect titles and descriptions.

Feeds the compiled list to the Sonnet model with a strict prompt that asks for the top‑5 most certainly useful memories.

Applies filters such as alreadySurfaced (skip memories already shown) and recentTools (skip tool reference docs already in use).

The selected memories are wrapped in a <system‑reminder> block and injected into the system prompt. If a memory is older than two days, a stale warning is added, prompting the model to verify its current validity.

Verification Before Use

Before acting on a memory, the system prompts the model to check the referenced file, function, or flag (e.g., grep the code) to avoid “authoritative errors” where stale information misleads the agent.

4. Design Takeaways for Your Own Agent

Structure over free text : enforce a schema so every memory is classifiable and searchable.

Index‑first, content‑on‑demand : keep a lightweight index in the prompt and load full markdown only when needed.

Cheap selector model : use a small LLM (e.g., Sonnet) to choose relevant memories instead of costly vector databases.

Time awareness & verification : add age warnings and require runtime checks for any referenced code or resources.

5. How to Answer the Interview Question

Start by stating that LLMs are stateless, then point out the four common pitfalls of existing memory solutions. Follow with Claude Code’s two‑layer design, the six static CLAUDE.md levels, the four dynamic memory types, the index‑plus‑on‑demand retrieval, the Sonnet selector, and the aging/verification mechanisms. Conclude with the four portable principles that can be applied to any agent project.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

prompt engineering Agent Architecture LLM memory Dynamic Memory Claude Code static layer vector retrieval alternatives

Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.