Artificial Intelligence 50 min read

Rethinking Agent Memory: From Raw Ledgers to Non‑Parametric Systems

This article analyses the nature of memory for LLM‑based agents, arguing that memory is a closed‑loop system composed of a raw ledger, derived views, and a policy layer, and explores how non‑parametric designs, system‑2 architectures, temporal structuring, and skill‑based execution can bridge the gap between parametric and non‑parametric memory while highlighting key bottlenecks and practical design guidelines.

Alibaba Cloud Developer

Apr 7, 2026

Rethinking Agent Memory: From Raw Ledgers to Non‑Parametric Systems

What Memory Really Is

Memory for an agent is defined as the long‑term, searchable, and reusable record of interactions that directly influences personalization, continual learning, and long‑range task performance. It is not merely a dump of historical data; its value lies in turning history into actionable evidence for decision making.

Core Propositions

Proposition A: Memory is not "storage" but "external state that can be used by the decision module".

Proposition B: The minimal viable memory consists of three components – a Raw Ledger (authoritative append‑only log), Derived Views (indexed, compressed, or graph‑based representations), and a Policy layer that decides when and how to read, write, or forget.

Proposition C: The basic unit of memory should be an event sequence , not a static document block.

Raw Ledger

An append‑only record that captures scope, timestamp, input observation, system actions, memory changes (ADD/UPDATE/DELETE/NONE), optional feedback signals, and optional decision metadata. It serves as the immutable source of truth for provenance and rollback.

Derived Views

Views are lossy but traceable structures built from the ledger, such as vector indexes, keyword/BM25 indexes, knowledge graphs, timelines, or skill indexes. They must be able to trace back to the raw ledger for auditability.

Policy Layer

The policy decides what to read, when to write, and how to update. Its output must be an explicit action sequence (ADD/UPDATE/DELETE/NONE) that can be recorded and replayed, rather than hidden in a prompt.

System 1 + System 2 Design

System 1 is the fast, parametric LLM that performs inference, planning, and tool use. System 2 is a slower, external loop that manages memory read/write, retrieval, and policy decisions. System 2 is essential because it keeps memory operations observable, auditable, and A/B‑testable, preventing the parametric model from becoming a monolithic black box.

Why System 2 Is Needed

Parametric memory embeds knowledge in weights, making updates costly and risking loss of generalization.

Non‑parametric memory externalizes knowledge, allowing rapid online adaptation without retraining.

System 2 provides a clear separation of concerns, enabling plug‑and‑play memory modules and easier debugging.

Parametric vs Non‑Parametric Memory

Parametric memory writes experience into model weights via training or fine‑tuning. Non‑parametric memory stores experience in external state (ledger + views + skill pool) and influences the model through retrieval, aggregation, and injection. The key difference is where the adaptation operator resides: in the training phase for parametric, and in the online inference phase for non‑parametric.

Non‑Parametric Upper Bound

To approach the performance of fine‑tuned models, a non‑parametric system must (1) inject high‑quality evidence into the LLM, (2) ensure retrieval accuracy, and (3) have a learnable, controllable policy. The upper bound is therefore limited by interface bandwidth, view approximation error, and policy learnability.

Policy Design and Learning

Traditional rule‑based (rubric) policies are insufficient. Two viable approaches are:

Training an external neural network to output read/write actions.

Prompting, fine‑tuning, or reinforcement‑learning an LLM to act as the policy.

Policy must be observable (recorded actions), auditable (provenance), and capable of RL‑style credit assignment despite delayed rewards.

Memory Unit Structure, Temporal Aspects, and Consolidation

Memory units can be compressed (e.g., SimpleMem) by clustering similar events into higher‑level representations. Temporal information is crucial: each event carries both transaction_time (when the system recorded it) and valid_time (when the fact was true in the world). Views must respect time slices to avoid treating outdated facts as current.

Consolidation should respect change points, avoid crossing them, and optionally produce narrative summaries that remain traceable to the raw ledger.

Temporal Memory

Bi‑temporal models separate transaction and validity times, enabling hard filtering (only facts valid at query time are considered) and soft weighting (recency, stability). This prevents "old facts become current" errors and supports queries like WHEN, WHAT‑IF, and historical analysis.

Procedural (Skill) Memory

Beyond declarative facts, agents need procedural memory – reusable skills or macros that encode "how to do" a task. Works such as ProcMEM introduce a three‑stage pipeline: (1) generate candidate skills from successful trajectories, (2) validate them with a PPO‑style gate that estimates advantage without changing model weights, and (3) maintain a skill pool using online scoring and pruning.

Skills are stored as {trigger, execute, terminate} triples, can be compact (few hundred tokens), and are transferable across tasks and even across different LLM backbones.

Integration Layer – Machine‑Native Memory Tokens

Traditional pipelines convert external memory to text, concatenate it to the prompt, and re‑encode. Recent works (LycheeMemory, MemAdapter) aim for machine‑native representations that are injected directly into the transformer’s KV‑cache or attention mechanism, reducing encoding overhead.

Two challenges remain:

Selection/gating is still required because attention capacity is limited.

Provenance and observability must be preserved so that each injected token can be traced back to its source.

LycheeMemory

Trains a compressor that maps input chunks to latent tokens resembling KV‑cache entries, which are then injected directly into attention.

MemAdapter

Provides a zero‑shot adapter that aligns heterogeneous memories (text, graphs, skills) to the LLM’s semantic space via generative sub‑graph retrieval.

Key Architectural Blueprint

The proposed five‑component architecture is deliberately implementation‑agnostic:

Kernel (Control Plane): System 2 scheduler and policy executor that emits explicit, logged actions.

File System (Storage Plane): Raw ledger plus derived, time‑aware views with full provenance.

Executable Files (Skill Plane): Verified, reusable macros that can be invoked by the agent.

Bus Interface (Context Bridge): Low‑overhead injection of machine‑native tokens while preserving traceability.

Learning Engine (Online Adaptation): Continuous improvement via advantage modulation, PPO‑style skill gating, or other non‑parametric updates.

Conclusions

Memory should be viewed as a closed‑loop system (Raw Ledger → Views → Policy → Commit → Provenance). System 2 is indispensable for scalable, observable, and maintainable memory. Non‑parametric memory’s performance ceiling is governed by interface bandwidth, view approximation error, and policy learnability, with temporal structuring being a fundamental architectural dimension rather than mere metadata. The five‑component blueprint provides a roadmap for building robust, plug‑and‑play memory systems without tying the design to any specific implementation.

(final answer / action)
+-------------------+      +---------------------------+      +------------------+
|   User / Env IO   | ---> | System 1: General Agent   | ---> | Output / Effect |
+-------------------+      | (LLM + tools + planner)   |      +------------------+
                         +---------------------------+
                                 +------------+
                                         ^
                                         | retrieved_context + provenance
                                         |
                                         |
                                         | memory_tool(query, ctx)
                                         v
+-----------------------------------------------------------------------------------+
|                               System 2: Agentic Memory (Slow Loop)                |
|                                                                                 |
| PreThink → Retrieve (loop) → Evidence Accumulate → Early Stop (conf >= tau)    |
|        |                     |                     |                           |
|        v                     v                     v                           |
|   +----------------------+   |   +----------------------+   |   +----------------------+ |
|   |   Memory Infra       |   |   |   Raw Ledger          |   |   |   Derived Views      | |
|   |   (ledger + views)   |   |   |   (authoritative)    |   |   |   (vector / KG / ...) | |
|   +----------------------+   |   +----------------------+   |   +----------------------+ |
|   |   Policy (control)   |   |   |   ADD/UPDATE/DELETE  |   |   |   ...                | |
+-----------------------------------------------------------------------------------+