How Claude Code’s Memory System Works: From SHA‑256 Storage to Coalescing Extraction
This article dissects Claude Code’s Memory subsystem, explaining the distinction between Session logs and persistent Memory, the SHA‑256‑based storage layout, file indexing, four memory types, prompt injection steps, two write pathways, the ExtractionCoordinator’s coalescing strategy, and how to explain the design in interviews.
Session vs. Memory
Claude Code runs each conversation inside a query_loop. User inputs become UserMessage objects and model replies become AssistantMessage. When the REPL exits the message list is written to a JSONL file via save_session() and can be re‑loaded with --resume. This log is called the Session – it records everything that happened but does not decide which facts are worth keeping.
The Memory subsystem is a long‑term note‑taking channel. It extracts salient insights from sessions and stores them as independent markdown files so the model can recall useful facts in future REPL starts.
Storage architecture and why SHA‑256 is used
All memory files live under the directory ~/.claude/projects/{project_id}/memory/ The project_id is the first 12 hexadecimal characters of a SHA‑256 hash of the current working directory. The identifier is generated in cc/memory/session_memory.py:18‑28 by the function _project_id():
def _project_id(cwd: str) -> str:
# deterministic, collision‑resistant identifier
return hashlib.sha256(cwd.encode()).hexdigest()[:12]Python’s built‑in hash() is salted (controlled by PYTHONHASHSEED) starting with version 3.3, which would produce different values for the same path across processes. SHA‑256 guarantees the same directory name after every restart while keeping the name short (48 bits).
File types in the memory directory
MEMORY.md– an index file where each line follows the pattern - safe_name -- description. The file is fully loaded when building the system prompt; only the first 200 lines are kept (constant MAX_ENTRYPOINT_LINES = 200 in sections.py:179). *.md – a separate markdown file for each memory entry. The file starts with a YAML front‑matter block containing name, description, and type, followed by the note body. Example:
---
name: feedback_no_mock
description: Do not mock the database (by: Wu)
type: feedback
---
Do not mock the database. A previous migration failed because of mock/production differences.File names are sanitized in session_memory.py:91 by replacing any character that is not alphanumeric, -, or _ with an underscore to avoid path‑injection and filesystem compatibility problems.
Four memory categories
user– stores the user’s role, goals, or domain knowledge. Example: “User is a data scientist interested in observability.” feedback – stores corrections or policies supplied by the user. Example: “Do not mock the database; it caused a production incident.” project – stores time‑sensitive project information such as decisions or deadlines. Example: “Thursday merge freeze, team moving to release branch.” reference – stores pointers to external systems (e.g., issue tracker IDs). Example: “Pipeline bug tracked in Linear project INGEST.”
How memory is injected into the system prompt (step 10)
The REPL startup routine _build_system() (defined in main.py:285) performs three actions:
Obtain the memory directory path and ensure the directory exists ( main.py:310‑313). The helper get_memory_dir() only reads; the directory is created here so that the Write tool can assume it already exists.
Load the MEMORY.md index via load_memory_index() ( session_memory.py:179‑198). If the file is missing or empty, None is returned.
Assemble the full system prompt ( builder.py:74‑135). When memory_dir is non‑empty, build_memory_prompt() appends a “Current memories” block after the environment information and SUMMARIZE_TOOL_RESULTS but before the static CLAUDE.md content. Example snippet:
You have a persistent, file‑based memory system at /Users/.../.claude/projects/.../memory/.
This directory already exists – write directly, no need for mkdir.
## Current memories
- [feedback_no_mock](feedback_no_mock.md) -- Do not mock the database
- [user_role](user_role.md) -- User is a data scientist focused on observability
... (full MEMORY.md content)The injected memory block is built only once at REPL start; new memories created during the session become visible only after the next REPL start (or after a manual model switch via /model which triggers another call to _build_system() at main.py:711).
Two write paths for memory
Path A – Explicit model‑driven writes
When the user says “remember this” or the model decides a fact is worth persisting, the prompt instructs the model to:
Use the Write tool to create <safe_name>.md with the required front‑matter.
Use the Edit tool to update MEMORY.md with a new index entry.
This path is entirely prompt‑driven; no special code is required. The advantage is immediacy – the model can Read the newly written file within the same conversation, although the reference appears only in tool results, not in the system prompt.
Path B – Implicit background extraction
After each REPL round, main.py launches an asynchronous task that extracts memories without blocking the user:
# fire‑and‑forget, non‑blocking REPL
task = asyncio.create_task(_bg_extract(messages, memory_dir, model))
_bg_tasks.add(task) # keep reference to avoid GC
task.add_done_callback(_bg_tasks.discard)Non‑blocking: asyncio.create_task() lets the user continue typing.
Low‑cost model call: Uses max_tokens=1024 because the extraction result is a short JSON.
GC safety: A global _bg_tasks set holds references to prevent premature garbage collection.
Prompt stability: Extracted memories are written to disk but do not refresh the current system prompt; they become visible only on the next REPL start.
ExtractionCoordinator – coalescing concurrency control
Path B can generate many overlapping extraction requests if the user types quickly. ExtractionCoordinator ( extractor.py:253‑328) implements a coalescing strategy rather than a simple debounce.
State variables
class ExtractionCoordinator:
_running: bool = False # extraction in progress?
_dirty: bool = False # new messages arrived during extraction?
_watermark: int = 0 # message count at last extractionWorkflow
If request_extraction() is called while _running is True, set _dirty = True and return immediately.
If no extraction is running, acquire a lock and enter a loop:
Clear _dirty.
Compute increment = current_message_count - _watermark.
If increment >= MIN_NEW_MESSAGES (threshold 4), call extract_memories().
Update _watermark.
If _dirty is False, exit; otherwise repeat to handle newly arrived messages.
This coalescing guarantees that every final state is processed: as long as _dirty was set at least once, another extraction run will occur after the current one finishes, avoiding the loss of intermediate requests that a pure debounce would cause.
Memory vs. Compact (context compression)
The Compact component compresses older conversation history into a CompactBoundaryMessage summary to free token budget. Compression is lossy – detailed code snippets, tool results, and reasoning may disappear. SUMMARIZE_TOOL_RESULTS (in sections.py:162) nudges the model to embed key information into the reply before compression, but it only protects the current session.
Memory, by contrast, writes permanent .md files that survive across sessions. The two outputs differ:
Compact output (summary):
→ Short‑term, may be re‑compressed, eventually archived.
Memory output (.md files):
→ Permanently stored, never auto‑deleted, re‑loaded on next REPL start.The design philosophy is: compression can discard implementation details, but Memory must retain decisive facts (e.g., user preferences, root‑cause of a bug).
Pessimistic trust strategy
The system teaches the model to treat memories as clues, not absolute facts. The TRUSTING_RECALL_SECTION ( sections.py:287‑297) states:
A memory that names a specific function, file, or flag is a claim that it existed when the memory was written. It may have been renamed, removed, or never merged.
Verification rules:
If a memory mentions a file path, first Read or Glob to confirm existence.
If it mentions a function or flag, Grep the codebase to verify.
Any action based on a memory must be preceded by a current‑state check; outdated memories are updated or deleted.
Summary of the Memory subsystem
The Claude Code Memory system combines deterministic SHA‑256 directory naming, a clear index/content file split, four well‑defined memory types, prompt‑driven injection at REPL start, dual write pathways (explicit model writes and background extraction), and a coalescing extraction coordinator to ensure reliable cross‑session knowledge while avoiding over‑reliance on potentially stale data.
Wu Shixiong's Large Model Academy
We continuously share large‑model know‑how, helping you master core skills—LLM, RAG, fine‑tuning, deployment—from zero to job offer, tailored for career‑switchers, autumn recruiters, and those seeking stable large‑model positions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
