Beyond Memory: How Context Substrates Are Redefining AI Agents
A comprehensive analysis of over 900 GitHub repositories reveals two distinct paradigms for agent memory—backend storage and context substrates—highlighting their technical differences, strengths, limitations, and the emerging shift toward context engineering for long‑running AI agents.
On GitHub, more than 450 repositories are tagged "agent-memory" and another 460+ are tagged "context-management". After surveying these projects, we discovered two fundamentally different paradigms that most developers do not clearly distinguish.
Two Camps
Camp 1: Backend Storage – Tools extract facts from conversations, store them in vector databases, and retrieve relevant information on demand.
Camp 2: Context Substrates – Tools maintain structured, human‑readable context that accumulates across sessions; the agent reads, writes, and evolves this context without any extraction step.
Most of the ecosystem (and most GitHub projects) belong to Camp 1, but Camp 2 represents the architecture that can scale to continuous, multi‑session, multi‑project agents, and the terminology is already shifting in that direction.
Camp 1: Backend Storage
Mem0 – 53.1k stars
Leader in adoption. Provides four operations (add, search, update, delete). Extracts facts from dialogue, stores them at three levels (user, session, agent), and retrieves via hybrid search. Simple integration via Python and TypeScript SDKs.
Limitation: Memory consists of flat entries with no relationships; each retrieval requires an LLM call, so quality depends on the extraction prompt. Stored facts never evolve, so outdated information coexists with newer facts.
MemPalace – 46.2k stars
Local‑first verbatim memory. Stores dialogues verbatim and organizes them into wings (entities), rooms (topics), and drawers (raw content) using ChromaDB.
Benchmark: LongMemEval recall 96.6% with pure semantic search, 98.4% with hybrid pipeline, >99% after LLM re‑ranking.
Limitation: Linear growth with usage; no compression or synthesis. Ideal for "find what I said three weeks ago" but not for summarizing the status of multiple ongoing projects.
Supermemory – 21.8k stars
Positions itself as "memory is not RAG". Introduces temporal awareness—new facts replace old ones (e.g., moving city). Expired facts are automatically forgotten. Retrieval latency ~50 ms. Connectors for Google Drive, Gmail, Notion, OneDrive, GitHub; multimodal support for PDFs, images, video, code.
Most Camp 1 tools treat facts as permanent; Supermemory treats them as evolving, bringing Camp 1 closer to state‑aware storage.
Other Camp 1 tools (brief)
Cognee (15.4k stars) combines vector search with a graph DB for relational reasoning. Memori (13.3k stars) intercepts LLM API calls to capture execution context, achieving 81.95% on LoCoMo with only 4.97% of full‑context tokens. Additional projects include AgentScope, MemOS, EverOS, MIRIX, SimpleMem, Memobase.
Common Traits of Camp 1 Tools
All follow the same loop: dialogue occurs → system extracts or stores content → facts enter a database (vector, graph, or both) → next dialogue retrieves relevant facts and injects them.
The intelligence lies in extraction and retrieval; the memory system operates behind the scenes, trusted to remember the right things at the right time.
This approach works well for fact recall (e.g., "what did I say about X?" or "what are the user preferences?") but does not address the broader problem of building a continuously evolving context.
Camp 2: Context Substrates
OpenClaw – 358k stars
Uses pure Markdown files: MEMORY.md for long‑term storage, daily notes ( YYYY‑MM‑DD.md) for running context, and DREAMS.md for integrated summaries.
Philosophy: "The model only 'remembers' what is saved to disk; there is no hidden state." No vector DB, no extraction pipeline; the file‑agent reads and writes directly.
Key feature – "Dreaming": a background process that consolidates daily notes into long‑term memory in three stages:
Shallow Sleep – groups daily notes into coherent blocks.
REM – weighted recall that promotes frequently accessed information to "persistent truth".
Deep Sleep – safely promotes selected blocks to MEMORY.md, avoiding duplication.
Entries must pass thresholds (score ≥ 0.8, recall ≥ 3, unique queries ≥ 3) and are scored by six weighted signals (relevance 0.30, frequency 0.24, query diversity 0.15, recency 0.15, integration 0.10, conceptual richness 0.06).
Zep – 4.4k stars
Rebranded from "memory" to "context engineering", signaling a market shift. Uses a time‑knowledge graph (Graphiti) with valid_at and invalid_at timestamps, auto‑extracts relations, and returns pre‑formatted context blocks optimized for LLM consumption. Retrieval latency <200 ms; SOC2 Type 2 and HIPAA compliant.
Zep sits between Camp 1 and Camp 2, still performing extraction and retrieval but adopting the new terminology.
Thoth – 145 stars
Personal knowledge graph with ten entity types and 67 relation types. Uses FAISS for vector search and performs a single‑hop graph expansion before each LLM call.
Features a four‑stage "dream cycle":
Merge duplicates above 0.93 similarity.
Enrich with contextual description.
Infer relations between co‑occurring entities.
Decay confidence for relations older than 90 days.
Three anti‑contamination layers prevent cross‑entity fact leakage. This is the most complex automatic memory refinement we observed.
TrustGraph – 2.0k stars
Introduces a "context core" – a portable, versioned bundle containing domain schema, knowledge graph, vector embeddings, evidence sources, and retrieval strategies. Treats context as code: versioned, testable, promotable, and roll‑backable.
Enables handing a context core to a new agent, forking it for experiments, and merging changes back.
MemSearch (by Zilliz) – 1.2k stars
Markdown‑first memory system backed by Milvus. Files are the source of truth; vector search is just an access layer. Provides three‑layer progressive disclosure (semantic chunk → full section → raw record) and hybrid search (dense vectors + BM25 + RRF re‑ranking).
Highlights that the truth resides in files, with vector indexes serving as a reconstructable overlay.
Common Traits of Camp 2 Tools
The loop differs:
Agent reads structured context → works within that context → writes back to the structured context → next session sees richer context.
Intelligence lies in accumulation; context is memory. Because the context is stored as readable files (Markdown, knowledge graphs, containers), humans can inspect, edit, and correct it, fully understanding what the agent knows.
Camp 1 optimizes "recall": can the system find the correct fact? Camp 2 optimizes "composition": does the system improve over time as context grows?
Future Direction
Running a 24/7 agent requires a clear pattern: memory and context solve different problems. An agent does not need to "remember" preferences; it needs to operate within a context that includes active projects, collaborators, recent decisions, and yesterday’s events, and that context must become richer over time. Backend storage excels at recall with >96% accuracy, ~200 ms latency, and plug‑and‑play APIs—ideal for chatbots that need to remember user preferences. However, for continuously operating agents that read from and write to a shared knowledge base, the context‑substrate approach is the only architecture that supports long‑term, evolving understanding. We predict that within six months the term "context engineering" will replace "memory" as the default terminology for serious agent infrastructure, and context‑substrate architectures will surpass fact‑storage projects.
Code Mala Tang
Read source code together, write articles together, and enjoy spicy hot pot together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
