What to Store and When to Skip: Lessons from Claude Code’s Memory Mechanism

The article dissects Claude Code’s memory system, showing that the real challenge is deciding what information to keep and when to discard, and it details design principles, index‑content separation, LLM‑based retrieval, expiration handling, write‑path isolation, and practical improvements applied to the author’s own agent platform.

inShocking
inShocking
inShocking
What to Store and When to Skip: Lessons from Claude Code’s Memory Mechanism

Core Problem of a Memory System

The fundamental issue isn’t "how to store" but "what to store" and "when not to store". In the author’s skill‑recommendation Agent, each user query "What skills do I have?" is recorded, leading to repeated entries like "user has skill X" that quickly become noise.

Claude Code’s First‑Principle Design

Memories are constrained to four types capturing context NOT derivable from the current project state . Code patterns, architecture, git history, and file structure are derivable (via grep/git/CLAUDE.md) and should NOT be saved.

The single judgment criterion is whether the information can be derived through tool calls such as grep or git log. If it can, it is excluded.

Code patterns? grep can find → do not store.

Git history? git log can find → do not store.

User is a data scientist? No tool can derive → store.

User hates mock tests? Not visible in code → store.

Merge freeze starting Thursday? Not in git → store.

Claude Code defines four memory types (src/memdir/memoryTypes.ts):

export const MEMORY_TYPES = ['user', 'feedback', 'project', 'reference'] as const;

Each type addresses a specific core question and typical content:

user : Who is the user? Example – "7 years Java, just moved to frontend".

feedback : What does the user like/dislike? Example – "Don’t summarize at the end of every reply".

project : What is happening now? Example – "2026‑03‑05 start merge freeze".

reference : Where to find external info? Example – "Pipeline bugs in Linear INGEST project".

Negative List (What Not to Save)

export const WHAT_NOT_TO_SAVE_SECTION = [
  '- Code patterns, conventions, architecture, file paths... ',
  '- Git history, recent changes... ',
  '- Debugging solutions or fix recipes... ',
  '- Anything already documented in CLAUDE.md files.',
  '- Ephemeral task details: in‑progress work, temporary state...',
  '',
  'These exclusions apply even when the user explicitly asks you to save.'
];

Even if a user explicitly requests saving, the system obeys the exclusion.

Index‑Content Separation to Prevent Context Bloat

Claude Code stores an index file MEMORY.md (max 200 lines, 25 KB). Each memory occupies one line, e.g.:

- [User preference code style](feedback_code_style.md) — functional first, no class components
- [Pipeline bug tracking](ref_pipeline_bugs.md) — in Linear INGEST project
- [User profile](user_role.md) — 7 years Java, just moved to frontend

The index stays in the system prompt (few hundred tokens); the actual content lives in separate topic files loaded on demand.

LLM‑Based Memory Selection Instead of Vector Search

Most memory systems use vector search (embedding + cosine similarity). Claude Code uses an LLM selector:

1. scanMemoryFiles()      ← read first 30 lines of all .md files
2. formatMemoryManifest() ← create list "- [type] filename (timestamp): description"
3. selectRelevantMemories() ← call Sonnet sideQuery, pick ≤5 memories
4. inject selected memories
You are selecting memories that will be useful to Claude Code as it processes a user's query… Return a list of filenames for the memories that will clearly be useful (up to 5). Only include memories that you are certain will be helpful. If you are unsure, do not include it. Be selective and discerning.

Reason: semantic similarity does not guarantee actual relevance. An LLM can reason about hidden dependencies (e.g., a bug‑fix request that needs prior database‑mock lessons) that vector similarity misses. The selector also passes recently used tools so the LLM can prioritize tool‑specific notes (e.g., Jira timeout tip). Retrieval cost is about $0.001‑$0.003 per call, far cheaper than a full vector‑DB infrastructure.

Expiration Management

The function memoryAge converts a modification timestamp into a human‑friendly label ("today", "yesterday", "47 days ago").

Models are poor at date arithmetic — a raw ISO timestamp doesn't trigger staleness reasoning the way "47 days ago" does.

If a memory is older than one day, a warning is appended, e.g.:

if (d <= 1) return ''  // no warning for today/yesterday
return `This memory is ${d} days old. Memories are point‑in‑time observations, not live state — claims about code behavior or file:line citations may be outdated.`

This guards against stale file‑line references that could mislead the LLM.

Three Mutually Exclusive Write Paths

Path A : User explicitly says “remember this”. The main Agent writes directly – highest trust.

Path B : After each turn, a sandboxed sub‑agent extracts memories. Permissions: read any file, execute read‑only shell commands, write to the memory directory; cannot write project code or execute destructive commands.

Path C : Auto‑dream – after 24 h and 5 sessions, logs are consolidated into thematic files.

The code ensures only one path writes per turn:

if (hasMemoryWritesSince(messages, lastMemoryMessageUuid)) {
  // Main Agent already wrote → skip background extraction
  return
}

Applying the Design to Our Own Agent Platform

Our platform originally recorded every turn after the Agent replied:

user says → Agent replies → store conversation in memory service

We refactored to a Retrieve‑Record loop:

user says → retrieve existing memories → Agent replies → LLM decides SKIP / UPDATE / CREATE

The LLM receives the current QA pair plus any relevant retrieved memories and decides whether to skip storing (duplicate), update an existing entry, or create a new one. This eliminated the repeated "what skills do I have" noise.

During implementation we hit a streaming‑response pitfall: the final chunk of a streamed LLM response is often empty, so taking only the last chunk yields no text. The fix was to concatenate all chunks before processing.

Gap Analysis Between Our System and Claude Code

Only store non‑derivable info : present only as prompt‑level constraint, no code enforcement.

Negative list : SKIP rules implemented.

Write‑path isolation : satisfied by single‑process architecture.

LLM semantic routing : still using plain top‑K vector retrieval.

Age tags : timestamps exist but not used for staleness warnings.

Feedback with Why + How : currently treated like ordinary memory.

Session‑memory safety net : missing protection for irrelevant key information during compression.

Periodic memory consolidation : no cross‑session fragment merging.

Practical Next Steps

Expiration tags : Append "X days ago" to retrieved records using the updated_at field; minimal code change, prevents blind trust in stale memories.

LLM‑based selection : After vector top‑K, run an LLM filter to reduce results to ≤5, costing ~$0.001‑$0.003 per call but improving relevance.

Real‑World Takeaways

The quality of a memory system depends more on what you exclude than on what you include. Adding a strict SKIP list dramatically reduced noise. Prompt placement matters: isolated sections for rules achieve higher evaluation scores than bullet‑embedded rules. Finally, semantic similarity from embeddings is not equivalent to actual usefulness; an LLM‑driven router can bridge that gap.

Analysis based on Claude Code source (src/memdir/, src/services/extractMemories/) and hands‑on experience integrating a memory service into our Agent platform.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

memory managementLLMretrievalagent architectureClaude Codeexpiration handling
inShocking
Written by

inShocking

Occasional sharing

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.