Artificial Intelligence 45 min read

Why Memory Architecture Remains Elusive: An In‑Depth Analysis of Agent Memory Systems

The article argues that memory for AI agents is not mere storage but a closed‑loop system comprising a raw ledger, derived views, and a policy layer, and examines how non‑parametric memory, time‑aware structures, and system‑2 control affect scalability, reliability, and performance.

ITPUB

Jun 2, 2026

Why Memory Architecture Remains Elusive: An In‑Depth Analysis of Agent Memory Systems

Memory as External State for Decision‑Making

Memory for long‑term interactive agents is the set of records, knowledge and experiences that can be retrieved and used to influence the current decision. Its value lies not in the amount of stored history but in how that history is transformed into usable information for the decision.

Core Propositions

Proposition A : Memory is not merely "storage"; it is external state that can be consumed by the agent .

Proposition B : The minimal closed‑loop memory consists of three components – a Raw Ledger , Derived Views and a Policy layer – rather than a single document block.

Proposition C : The atomic unit of memory is an event sequence , not a raw stream of events.

Raw Ledger

The ledger is an append‑only log that records every write, update or delete with the following fields (as defined in the article):

Scope : user, session or task identifier.

Timestamp : wall‑clock time of the event.

Input Observation : the incoming messages or environment snapshot.

System Action : the agent’s output or a memory‑tool operation.

Memory Change : ADD / UPDATE / DELETE / NONE.

Feedback (optional) : reward, user rating or task success.

Decision Metadata (optional) : candidate set, provenance, early‑stop threshold.

The ledger serves as an immutable audit trail ("transaction_time") while each event also carries a valid_time indicating when the fact is true in the world.

Derived Views

Views are lossy, query‑oriented structures built on top of the ledger. Typical examples include:

Vector similarity indexes.

Keyword / BM25 inverted indexes.

Temporal Knowledge Graphs (TKG) and timelines.

Skill indexes for executable macros.

All views must be traceable back to the raw ledger entries ("100 % provenance").

Policy Layer

The policy decides when and how to read, write, update or forget memory. Decisions are expressed as explicit action sequences (e.g., ADD, UPDATE, DELETE, NONE) rather than hidden prompts. The policy is a trainable, auditable component that can be optimized via RL or supervised fine‑tuning.

System 1 + System 2 Design

System 1 is the general LLM/agent that performs inference, planning and tool use. System 2 is a slow, external loop that handles memory read/write, retrieval and policy execution. System 2 makes memory operations observable, replayable and A/B‑testable, allowing the agent to retain its generic capabilities while gaining personalized, long‑term knowledge.

Parametric vs. Non‑Parametric Memory

Parametric memory : experiences are compiled into model weights through training or fine‑tuning; inference uses the updated weights directly.

Non‑parametric memory : experiences are stored externally (ledger + views + skill pool) and injected at inference time via retrieval, aggregation and additive correction Δ to the logits. This approach keeps the LLM weights unchanged.

Non‑parametric memory can approximate parametric fine‑tuning by retrieving similar trajectories and applying an additive correction Δ to the logits, as demonstrated in JitRL and UMEM .

Upper‑Bound Analysis

The performance ceiling of non‑parametric memory is limited by three bottlenecks:

Interface bandwidth : the amount of external information that can be injected into the LLM is bounded by token limits, latency and attention capacity.

Retrieval & aggregation error : views are approximations; retrieval noise, missed hits and temporal conflicts degrade the correction Δ.

Policy learnability & controllability : the policy must learn when to read/write and must be auditable; delayed credit assignment and retroactive corrections make learning difficult.

Temporal Dimension

A bi‑temporal ledger distinguishes transaction_time (when the system recorded an event) from valid_time (when the fact is true). Views must respect valid‑time slices, and the policy enforces hard gating so that outdated facts are never treated as current.

Procedural Memory and Skills

Beyond declarative facts, agents need procedural memory – executable skills. Systems such as ProcMEM treat successful interaction trajectories as macro‑skills, evaluate them with non‑parametric PPO, and maintain a skill pool based on online scores, decay and redundancy removal. Skills are stored as skill units that can be invoked by the policy.

Integration Layer & Machine‑Native Tokens

Recent work explores "machine‑native" memory tokens that bypass text tokenization and are injected directly into the transformer’s KV‑cache:

LycheeMemory trains a compressor that turns retrieved chunks into latent tokens compatible with the KV‑cache, reducing encoding overhead.

MemAdapter aligns heterogeneous memory (text, graphs, skills) to the LLM’s semantic space without fine‑tuning.

While these approaches improve throughput, they raise governance challenges: selection/gating, provenance, and rollback must still be enforced.

Architectural Summary (Five‑Component Model)

Kernel (System 2 control plane) : schedules retrieval, writes, updates and forgetting; exposes decisions as trainable, auditable actions.

File system (storage plane) : raw ledger plus layered views with temporal consistency and traceability.

Executable layer (skill plane) : procedural units that are verifiable, governable and reusable across tasks.

Bus interface (context bridge) : injects external state into the model efficiently while supporting observability.

Learning engine (online adaptation) : converts interaction feedback into improvements without altering model weights (e.g., advantage modulation, skill evolution).

Key Code Illustration

(final answer / action)
+-------------------+      +---------------------------+      +------------------+
|   User / Env IO   | ---> | System 1: General Agent   | ---> | Output / Effect |
+-------------------+      | (LLM + tools + planner)   |      +------------------+
                           +---------------------------+
                                   +------------+--------------+
                                                ^
                                                | retrieved_context + provenance
                                                |
                                                |
                                                | memory_tool(query, ctx)
                                                v
+-----------------------------------------------------------------------------------+
|                               System 2: Agentic Memory (Slow Loop)               |
+-----------------------------------------------------------------------------------+
| PreThink → Retrieve (loop) → Evidence Accumulate → Early Stop(conf >= tau)          |
|   |               |                     |                     |                |
|   v               v                     v                     v                |
|   +----------------------+   +----------------------+   +----------------------+ |
|   |   Memory Infra       |   |   Raw Ledger         |   |   Derived Views      | |
|   |   (ADD/UPDATE/DELETE)|   |   Append‑only events |   |   Vector / Keyword   | |
|   +----------------------+   +----------------------+   +----------------------+ |
+-----------------------------------------------------------------------------------+

Conclusions

Memory is a closed‑loop system (Ledger → Views → Policy → Commit → Provenance). Missing any component breaks governance, usability or sustainability.

System 2 is essential for scalable, plug‑in memory that remains orthogonal to the LLM’s core abilities.

The ceiling of non‑parametric memory is set by interface bandwidth, view approximation error and policy controllability.

Bi‑temporal structures and time‑sliced recall are hard constraints, not optional metadata.

The five‑component architecture abstracts the required modules without prescribing concrete implementations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Agent Memory Retrieval policy temporal system‑2 non‑parametric

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Memory as External State for Decision‑Making

Core Propositions

Raw Ledger

Derived Views

Policy Layer

System 1 + System 2 Design

Parametric vs. Non‑Parametric Memory

Upper‑Bound Analysis

Temporal Dimension

Procedural Memory and Skills

Integration Layer & Machine‑Native Tokens

Architectural Summary (Five‑Component Model)

Key Code Illustration

Conclusions

ITPUB

How this landed with the community

Was this worth your time?

0 Comments

System 1 + System 2 Design