Rethinking AI Memory: From Raw Ledger to Policy‑Driven Closed Loop
The article argues that AI memory is not mere storage but an external state that feeds decisions, proposes three core propositions—Memory as decision‑usable external state, a minimal closure of Raw Ledger + Views + Policy, and event sequences as the fundamental unit—and details how a System 1 + System 2 architecture, non‑parametric designs, temporal handling, and learnable policies together shape the practical limits of modern agentic memory systems.
Core propositions of memory
Proposition A : Memory is not a passive store; it is an external state that must be transformed into evidence, summaries, sub‑graphs, or executable skills and fed to the reasoning layer. The value lies in the channel from history to the current decision, not in the amount of stored data.
Proposition B : The minimal closed‑loop memory consists of three components – Raw Ledger (authoritative append‑only event log), Derived Views (indexed, compressed or materialized representations that are traceable back to the ledger), and Policy (a control layer that decides when and how to read, write, update or forget, emitting explicit action sequences).
Proposition C : The basic unit of memory is an event sequence , but a raw event stream alone is insufficient; useful memory requires Views and Policy to turn events into actionable information.
Why a non‑empty System 2 is needed
System 1 (the LLM weights) provides generic capabilities, while System 2 handles memory write, retrieval and update as explicit, observable and replayable processes. Without System 2, memory would be baked into model weights, limiting adaptability and making per‑user personalization hard to preserve.
External tools let agents extend capabilities faster than internal weight updates (biological analogy).
System 1 + System 2 design
(final answer/action)
+-------------------+ +---------------------------+ +------------------+
| User/Env IO | ---> | System 1: General Agent | ---> | Output / Effect |
+-------------------+ | (LLM + tools + planner) | +------------------+
+---------------------------+
^
|
| retrieved_context + provenance
|
v
+-----------------------------------------------------------------------------------+
| System 2: Agentic Memory (Slow Loop) |
| PreThink --> Retrieve (loop) --> Evidence Accumulate --> Early Stop (conf >= τ) |
+-----------------------------------------------------------------------------------+
| Memory Infra: Raw Ledger (ADD/UPDATE/DELETE) | Derived Views (vector, KG, timeline) |
+-----------------------------------------------------------------------------------+Parametric vs. non‑parametric memory
Parametric memory : experiences are baked into model weights via training or fine‑tuning; inference uses the updated model directly.
Non‑parametric memory : experiences reside in external state (ledger + views + skill pool). Policy decides what to write and how to retrieve; during inference, retrieved evidence is injected as a controllable correction Δ to the model logits.
The key difference is where the adaptation operator is placed: pre‑training for parametric, online commit/retrieve for non‑parametric.
Upper‑bound analysis of non‑parametric memory
Interface bandwidth : the amount of external evidence that can be injected into System 1 is bounded by token limits, attention capacity and latency.
Retrieval & aggregation error : Views are approximations of the ledger; errors (misses, temporal conflicts, semantic drift) directly corrupt the correction Δ.
Policy learnability & controllability : The Memory Algorithm Protocol must produce reliable action sequences; poor write/read decisions, noisy updates or irreversible mistakes degrade long‑term behavior.
Policy is often the most underestimated bottleneck because it must be both learnable (e.g., via RL) and auditable (actions must be recorded, replayable and A/B‑testable).
Temporal dimension as structural backbone
Time is not mere metadata; it is a structural dimension that must be represented in the ledger (transaction_time vs. valid_time), in views (time‑sliced retrieval) and in policy (validity gating, tombstone handling, decay). Bi‑temporal models such as Zep/Graphiti enforce hard constraints that prevent "old facts" from being treated as current.
Memory modules and their roles
Kernel / Control Plane (System 2) : decides when to read/write, orchestrates planners, and emits explicit action logs.
File System / Storage Plane : Raw Ledger plus derived, time‑aware views; supports consolidation, compression and provenance.
Executable / Skill Plane : stores procedural memories (skills, macros) that can be invoked as actions; requires verification and governance.
Interface / Context Bridge : injects external state into the transformer (e.g., via memory tokens, KV‑cache injection) while preserving observability.
Learning Engine : online adaptation that updates policies, skill scores or retrieval strategies without changing model weights.
Key recent works referenced
AgeMem – RL‑trained memory‑tool usage.
InfMem – PreThink‑Retrieve‑Write protocol with adaptive early stopping.
SimpleMem – Recursive consolidation of memory units, achieving ~1/30 token consumption on long‑dialogue tasks.
UMEM – Semantic neighborhoods built by cosine similarity and GRPO‑based reward modeling.
LycheeMemory – Latent memory tokens injected into KV‑cache, reducing encode/decode overhead.
MemAdapter – Generative sub‑graph retrieval for heterogeneous memories.
ProcMEM – Skill‑MDP with non‑parametric PPO for skill evaluation and online maintenance.
Final takeaways
Memory is a closed‑loop system, not a passive store.
A non‑empty System 2 is essential for scalable, plug‑and‑play memory.
The ceiling of non‑parametric memory is governed by interface bandwidth, view error and policy quality.
Temporal reasoning must be built into the architecture, not left to LLM inference.
The five‑module abstraction (Kernel, File System, Skill Plane, Interface, Learning Engine) captures the necessary components without tying to specific implementations.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AntData
Ant Data leverages Ant Group's leading technological innovation in big data, databases, and multimedia, with years of industry practice. Through long-term technology planning and continuous innovation, we strive to build world-class data technology and products.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
