Artificial Intelligence 20 min read

How Claude Code Structures Its Memory: A Deep Dive into Multi‑Layered Agent Memory Design

This article dissects Claude Code's memory architecture, explaining its four distinct memory layers, file‑based long‑term storage, dynamic retrieval without embeddings, multi‑stage write paths, and session‑compression strategies, while highlighting design trade‑offs and practical takeaways for building robust AI agents.

Architecture and Beyond

Apr 4, 2026

How Claude Code Structures Its Memory: A Deep Dive into Multi‑Layered Agent Memory Design

1. What Claude Code Remembers

Claude Code categorises memory into four types— user, feedback, project, and reference —defined in memoryTypes.ts. These items represent information that cannot be inferred from the current code state, such as user habits, project context, or external system entry points.

2. Four‑Layer Division

Auto/Team Memory (long‑term): stored under paths.ts and teamMemPaths.ts.

Relevant Memories (per‑turn): fetched by findRelevantMemories.ts and attachments.ts.

Session Memory (compression): kept in summary.md and managed by sessionMemory.ts.

Agent Memory (sub‑agent): isolated directories for user, project, or local scopes, implemented in agentMemory.ts.

Separating these concerns clarifies responsibilities and keeps the system stable.

3. Why Long‑Term Memory Uses Files

Each memory is a separate Markdown file with a front‑matter block, indexed by a central MEMORY.md. This file‑system approach offers easy auditability, manual repair, and low‑cost debugging compared to a database or vector store. The front‑matter must contain a description field, which serves as the retrieval hook.

Trade‑offs include directory‑scan overhead and token‑limit pressure from a growing MEMORY.md index, mitigated by Claude Code's dynamic recall mechanism.

4. Write Path

4.1 Main model writes directly : The system prompt injects memory rules via loadMemoryPrompt() (see memdir.ts), allowing the model to decide when to write a file.

loadMemoryPrompt() // injects directory, type, index format, read/write rules into the prompt

4.2 Background extractor fallback : If the model does not write, a stop‑hook triggers extractMemoriesModule.executeExtractMemories (see extractMemories.ts).

if (feature('EXTRACT_MEMORIES') && !toolUseContext.agentId && isExtractModeActive()) {
  void extractMemoriesModule!.executeExtractMemories(stopHookContext, toolUseContext.appendSystemMessage);
}

The extractor skips writing when a memory file was already created in the current turn, preventing duplicate entries.

4.3 KAIROS mode : Writes are appended to a daily log file; a nightly job distils these logs into topic files and MEMORY.md.

5. Retrieval (Recall) Mechanism

5.1 Static injection : getUserContext() builds a claudeMd field containing stable files such as CLAUDE.md, rules, and MEMORY.md (see context.ts).

const claudeMd = shouldDisableClaudeMd ? null : getClaudeMds(filterInjectedMemoryFiles(await getMemoryFiles()))

5.2 Dynamic retrieval : After each user turn, startRelevantMemoryPrefetch (in query.ts) triggers a side‑query model that scans the manifest generated by scanMemoryFiles() and returns up to five relevant file names.

// Step 1: scan .md files and build manifest
scanMemoryFiles();
// Step 2: side‑query model selects top‑k files
// Step 3: read selected files, truncate, and attach to context

This approach avoids embedding construction and index maintenance while keeping recall fast.

6. How Memories Enter the Context

System prompt : injects memory usage rules.

User context : adds stable background files via claudeMd.

Attachment : relevant memories become <system‑reminder> messages (see messages.ts).

Session compact : when the token window exceeds limits, summary.md is used to replace the oldest part of the conversation.

7. Where Compression Happens

Session memory compression is performed by trySessionMemoryCompaction() (see sessionMemoryCompact.ts). The process:

Verify that session memory and compaction are enabled.

Wait for any ongoing session‑memory extraction to finish.

Read summary.md.

If empty, fall back to the traditional compaction.

Calculate the recent‑message window (default minTokens=10000, minTextBlockMessages=5, maxTokens=40000).

Build a new compact summary using the trimmed summary.md plus recent messages, attachments, and hooks.

The algorithm also preserves tool_use / tool_result pairs to keep API invariants intact.

8. Session Memory Section‑Level Truncation

When summary.md itself is too large, flushSessionSection splits the file into sections, keeps the first half of any oversized section, and inserts a placeholder [... section truncated for length ...]. This coarse‑grained truncation ensures the system never fails due to token limits.

9. KAIROS‑Specific Compression

In KAIROS mode, daily logs are appended throughout the day. At night a “dream” process distils these logs into topic files and updates MEMORY.md. This separates long‑term event aggregation from the per‑turn context compression described earlier.

10. Pros, Cons, and Takeaways

Pros :

Clear layer separation reduces cross‑concern complexity.

File‑first storage offers transparency and low operational cost.

Dynamic manifest‑plus‑side‑query retrieval provides a high ROI for CLI‑style agents.

Preserving a recent‑window and fixing tool invariants prevents subtle bugs during compaction.

Cons :

Recall quality hinges on the description field in front‑matter.

Dual‑channel writes (model + extractor) increase state‑management complexity.

Section‑level truncation is coarse and may discard useful detail.

The MEMORY.md index can still become a bottleneck as the number of files grows.

Practical advice for teams building their own agents:

Identify the exact memory problems you need to solve before designing a monolithic Memory API.

Start with a file‑system implementation; only move to databases or vector stores when scaling demands it.

Embed a high‑quality description at write time to shift retrieval effort upstream.

Maintain a recent‑message window during compression and ensure tool‑use/result pairs stay intact.

11. Key Source Files

Long‑term memory rules & paths: memdir.ts, paths.ts, memoryTypes.ts Static injection & dynamic recall: claudemd.ts, attachments.ts, findRelevantMemories.ts Session memory & compression: sessionMemory.ts, prompts.ts, sessionMemoryCompact.ts Team‑shared memory: teamMemPaths.ts,

teamMemorySync/index.ts

AI Architecture Agent Memory Claude Code Dynamic Retrieval File‑Based Storage Session Compression

Written by

Architecture and Beyond

Focused on AIGC SaaS technical architecture and tech team management, sharing insights on architecture, development efficiency, team leadership, startup technology choices, large‑scale website design, and high‑performance, highly‑available, scalable solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.