Rethinking AI Memory: From Raw Ledger to Policy‑Driven Closed Loop

The article argues that AI memory is not mere storage but an external state that feeds decisions, proposes three core propositions—Memory as decision‑usable external state, a minimal closure of Raw Ledger + Views + Policy, and event sequences as the fundamental unit—and details how a System 1 + System 2 architecture, non‑parametric designs, temporal handling, and learnable policies together shape the practical limits of modern agentic memory systems.

AntData
AntData
AntData
Rethinking AI Memory: From Raw Ledger to Policy‑Driven Closed Loop

Core propositions of memory

Proposition A : Memory is not a passive store; it is an external state that must be transformed into evidence, summaries, sub‑graphs, or executable skills and fed to the reasoning layer. The value lies in the channel from history to the current decision, not in the amount of stored data.

Proposition B : The minimal closed‑loop memory consists of three components – Raw Ledger (authoritative append‑only event log), Derived Views (indexed, compressed or materialized representations that are traceable back to the ledger), and Policy (a control layer that decides when and how to read, write, update or forget, emitting explicit action sequences).

Proposition C : The basic unit of memory is an event sequence , but a raw event stream alone is insufficient; useful memory requires Views and Policy to turn events into actionable information.

Why a non‑empty System 2 is needed

System 1 (the LLM weights) provides generic capabilities, while System 2 handles memory write, retrieval and update as explicit, observable and replayable processes. Without System 2, memory would be baked into model weights, limiting adaptability and making per‑user personalization hard to preserve.

External tools let agents extend capabilities faster than internal weight updates (biological analogy).

System 1 + System 2 design

(final answer/action)
+-------------------+      +---------------------------+      +------------------+
|   User/Env IO    | ---> | System 1: General Agent   | ---> | Output / Effect |
+-------------------+      | (LLM + tools + planner)   |      +------------------+
                           +---------------------------+
                                   ^
                                   |
                                   | retrieved_context + provenance
                                   |
                                   v
+-----------------------------------------------------------------------------------+
|                         System 2: Agentic Memory (Slow Loop)                     |
|  PreThink --> Retrieve (loop) --> Evidence Accumulate --> Early Stop (conf >= τ) |
+-----------------------------------------------------------------------------------+
|   Memory Infra: Raw Ledger (ADD/UPDATE/DELETE) | Derived Views (vector, KG, timeline) |
+-----------------------------------------------------------------------------------+

Parametric vs. non‑parametric memory

Parametric memory : experiences are baked into model weights via training or fine‑tuning; inference uses the updated model directly.

Non‑parametric memory : experiences reside in external state (ledger + views + skill pool). Policy decides what to write and how to retrieve; during inference, retrieved evidence is injected as a controllable correction Δ to the model logits.

The key difference is where the adaptation operator is placed: pre‑training for parametric, online commit/retrieve for non‑parametric.

Upper‑bound analysis of non‑parametric memory

Interface bandwidth : the amount of external evidence that can be injected into System 1 is bounded by token limits, attention capacity and latency.

Retrieval & aggregation error : Views are approximations of the ledger; errors (misses, temporal conflicts, semantic drift) directly corrupt the correction Δ.

Policy learnability & controllability : The Memory Algorithm Protocol must produce reliable action sequences; poor write/read decisions, noisy updates or irreversible mistakes degrade long‑term behavior.

Policy is often the most underestimated bottleneck because it must be both learnable (e.g., via RL) and auditable (actions must be recorded, replayable and A/B‑testable).

Temporal dimension as structural backbone

Time is not mere metadata; it is a structural dimension that must be represented in the ledger (transaction_time vs. valid_time), in views (time‑sliced retrieval) and in policy (validity gating, tombstone handling, decay). Bi‑temporal models such as Zep/Graphiti enforce hard constraints that prevent "old facts" from being treated as current.

Memory modules and their roles

Kernel / Control Plane (System 2) : decides when to read/write, orchestrates planners, and emits explicit action logs.

File System / Storage Plane : Raw Ledger plus derived, time‑aware views; supports consolidation, compression and provenance.

Executable / Skill Plane : stores procedural memories (skills, macros) that can be invoked as actions; requires verification and governance.

Interface / Context Bridge : injects external state into the transformer (e.g., via memory tokens, KV‑cache injection) while preserving observability.

Learning Engine : online adaptation that updates policies, skill scores or retrieval strategies without changing model weights.

Key recent works referenced

AgeMem – RL‑trained memory‑tool usage.

InfMem – PreThink‑Retrieve‑Write protocol with adaptive early stopping.

SimpleMem – Recursive consolidation of memory units, achieving ~1/30 token consumption on long‑dialogue tasks.

UMEM – Semantic neighborhoods built by cosine similarity and GRPO‑based reward modeling.

LycheeMemory – Latent memory tokens injected into KV‑cache, reducing encode/decode overhead.

MemAdapter – Generative sub‑graph retrieval for heterogeneous memories.

ProcMEM – Skill‑MDP with non‑parametric PPO for skill evaluation and online maintenance.

Final takeaways

Memory is a closed‑loop system, not a passive store.

A non‑empty System 2 is essential for scalable, plug‑and‑play memory.

The ceiling of non‑parametric memory is governed by interface bandwidth, view error and policy quality.

Temporal reasoning must be built into the architecture, not left to LLM inference.

The five‑module abstraction (Kernel, File System, Skill Plane, Interface, Learning Engine) captures the necessary components without tying to specific implementations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

retrieval augmentationagent architectureAI memorynon‑parametric memorypolicy controltemporal knowledge graph
AntData
Written by

AntData

Ant Data leverages Ant Group's leading technological innovation in big data, databases, and multimedia, with years of industry practice. Through long-term technology planning and continuous innovation, we strive to build world-class data technology and products.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.