Artificial Intelligence 46 min read

The True Nature of Agent Memory: Deep Dive into Architecture and Design

The article analyses why a real agent must have memory, defining memory as an external state that feeds decision‑making, proposing a three‑part architecture (Raw Ledger, Views, Policy), contrasting parametric and non‑parametric approaches, and detailing bottlenecks, temporal handling, and procedural extensions.

dbaplus Community

May 5, 2026

The True Nature of Agent Memory: Deep Dive into Architecture and Design

Memory Essence

Memory is the long‑term, retrievable record of interactions that can be leveraged by the decision layer. Its value lies in the channel that transforms historical events into evidence, summaries, sub‑graphs, or executable skills that influence the current decision distribution.

Proposition A : Memory is not merely "storage" but an external state that the policy layer must extract usable information from before producing an output.

Proposition B : The minimal viable memory consists of three components – Raw Ledger (authoritative append‑only log), Derived Views (retrievable, possibly lossy representations such as vector indexes, keyword or temporal knowledge‑graph views), and Policy (the control loop that decides when and how to read, write, update, or forget). Missing any of these breaks provenance, observability, or sustainability.

Proposition C : The basic unit of memory is an event sequence, but a raw event stream alone is too low‑level; useful memory requires transformation into views and governance by policy.

System 1 + System 2 Design

System 1 is the fast, weight‑based LLM that performs general reasoning, planning, and tool use. System 2 is an explicit slow loop that handles memory write, retrieval, and update, making memory operations observable, replayable, and A/B‑testable. Decoupling memory from the LLM weights preserves the LLM’s generalization while allowing targeted, plug‑in memory improvements.

Non‑parametric memory stores experience in external state (ledger + views + skill pool) and applies policy‑driven writes. Parametric memory embeds experience directly into model weights via fine‑tuning. The key difference is where the adaptation cost is incurred: training time for parametric memory versus online write/read time for non‑parametric memory.

Non‑Parametric Memory Upper Bound

Interface bandwidth : The amount of information that can be injected into the LLM (tokens, KV cache, latency) is finite; therefore the memory must compress or prioritize information.

Retrieval & aggregation error : Views are approximations of the ledger; errors such as noise, stale facts, or temporal conflicts directly corrupt the correction term Δ applied to logits.

Policy learnability & controllability : The policy must produce explicit action sequences (ADD/UPDATE/DELETE) that are auditable; poor policy leads to polluted writes, missed reads, or irreversible errors.

Empirical evidence from JitRL, InfMem, and UMEM shows that adaptive early stopping, reward‑guided retrieval, and bi‑temporal indexing can mitigate these bottlenecks.

Temporal Dimension

Time is treated as a first‑class structural dimension, not mere metadata. A bi‑temporal ledger records both transaction_time (when the system wrote the fact) and valid_time (when the fact is true in the world). Views must respect time slices, and policy must decide when to query historical versus current facts, preventing “old facts as current” errors.

Procedural Memory Layer

Beyond declarative facts, the system stores procedural knowledge as skills or macros. Works such as ProcMEM and AgeMem formalize skills as (trigger, execution, termination) triples and use non‑parametric PPO to evaluate and evolve them without touching LLM weights. Skills are maintained in an online scoring pool that prunes low‑value or redundant macros.

Integration Layer

External memory must be represented, aligned, and governed before injection. LycheeMemory compresses external chunks into latent tokens that can be fed directly into the KV‑cache, while MemAdapter aligns heterogeneous structures (graphs, skills) to the LLM’s semantic space. Governance requires provenance links from each injected token back to the original ledger entry.

Five‑Component Blueprint

Kernel / Control Plane : System 2 scheduler that emits readable action sequences.

File System / Storage Plane : Raw Ledger + Views with temporal consistency and hierarchical consolidation.

Executable / Skill Plane : Stored macros that are executable, verifiable, and governable.

Interface / Context Bridge : Token‑budget‑aware injection mechanism (memory tokens, latent tokens) with observability.

Learning Engine / Online Adaptation : Continuous reward‑driven updates to retrieval, policy, and skill pools without weight updates.

Parametric vs Non‑Parametric Memory

Parametric memory writes experience into model weights (training / fine‑tuning). Non‑parametric memory writes experience into external state (ledger + views + skill pool) and influences the LLM at inference time via retrieval and injection. The adaptation cost shifts from offline training to online commit and retrieval, enabling plug‑in improvements and A/B testing.

Memory‑Driven Logit Modulation

When the LLM produces logits, an external memory can apply a controllable correction term Δ. This Δ is derived from retrieved evidence and can be implemented as an additive advantage modulation (e.g., JitRL).

Policy Requirements

The Memory Algorithm Protocol forces policy outputs into explicit action sequences (ADD/UPDATE/DELETE/NONE) and mandates that UPDATE/DELETE be constrained by a candidate set and that retrieval include provenance. Thus policy must be both learnable (e.g., RL with GRPO) and governable (audit‑able actions, sandbox replay).

Upper‑Bound Bottlenecks in Detail

Interface bandwidth : Token budget, attention length, and latency limit how much evidence can be injected. Compression, hierarchical memory, and latent tokens aim to increase information density per token.

Retrieval & aggregation error : Views are approximations; noise, stale facts, and temporal conflicts corrupt Δ. Mitigations include bi‑temporal indexing (UMEM) and adaptive early stopping (InfMem) which achieved a 3.9× speedup.

Policy learnability & controllability : Policy must balance write frequency, recall quality, and forgetting. Errors such as over‑writing, noisy writes, or missed reads can cause irreversible degradation. Effective policy requires RL training (AgeMem, InfMem) and explicit action logging.

Memory System Control Layer (Policy)

Rubrics‑based static rules are insufficient. Two viable approaches are:

Training an external neural network to output actions.

Prompt‑/SFT‑/RL‑tuning a language model to act as a controller. The latter aligns with the GRPO training paradigm and has been demonstrated in AgeMem and InfMem.

InfMem’s PreThink‑Retrieve‑Write protocol adds a “pre‑think” step that evaluates whether internal knowledge suffices before external retrieval, reducing latency.

Memory Unit Structure and Compression

SimpleMem stores compressed memory units rather than raw text. Each unit has an embedding vector; an affinity score determines similarity. Recursive consolidation merges high‑affinity units into higher‑level abstractions, achieving significant token‑consumption reduction while preserving provenance.

Temporal memory (Zep/Graphiti) adds valid_time to edges, enabling time‑sliced retrieval. Experiments report an 18.5% accuracy gain over traditional RAG on long‑sequence benchmarks.

Temporal × Policy Interactions

Validity gating : Facts outside the query’s time window are excluded.

Tombstone : Explicit revocation of a fact for a time interval while preserving auditability.

Decay : Adjustable weight for still‑valid facts, used as a soft ranking signal.

The order of application (validity → tombstone → decay) prevents stale facts from re‑emerging.

Temporal × Skill Management

last_verified : Timestamp of the last successful execution of a skill.

applicability window : Time interval during which the skill’s environment remains valid.

Policy may trigger re‑validation when the query time diverges from last_verified.

Memory Tokens: Representation, Alignment, Governance

LycheeMemory proposes latent “memory tokens” that are injected directly into the KV‑cache, bypassing text tokenization. MemAdapter aligns heterogeneous memory (KG, skills) to the LLM’s semantic space, enabling zero‑shot use without fine‑tuning. Both approaches require provenance links and selection/gating mechanisms to stay within token budgets and to ensure observability.

Architectural Summary (ASCII Diagram)

(final answer / action)
+-------------------+      +---------------------------+      +------------------+
|   User / Env IO   | ---> | System 1: General Agent   | ---> | Output / Effect |
+-------------------+      | (LLM + tools + planner)   |      +------------------+
                           +---------------------------+
                                 +------------+--------------+
                                              ^
                                              |
                                   retrieved_context + provenance
                                              |
                                   memory_tool(query, ctx)
                                              v
+-----------------------------------------------------------------------------------+
|                               System 2: Agentic Memory (Slow Loop)            |
| PreThink --> Retrieve (loop) --> Evidence Accumulate --> Early Stop(conf >= τ) |
+-----------------------------------------------------------------------------------+

Key Takeaways

Memory is a closed‑loop system, not a passive store.

System 2 is essential for scalable, plug‑in memory that co‑exists with a general LLM.

Non‑parametric memory’s ceiling is governed by interface bandwidth, view approximation error, and policy quality.

Bi‑temporal handling is a hard constraint for correctness.

The five‑component blueprint abstracts away implementation details while capturing all required modules.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

retrieval Agent Memory Memory Architecture non‑parametric memory policy control system 1 system 2 temporal knowledge graph

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Memory Essence

System 1 + System 2 Design

Non‑Parametric Memory Upper Bound

Temporal Dimension

Procedural Memory Layer

Integration Layer

Five‑Component Blueprint

Parametric vs Non‑Parametric Memory

Memory‑Driven Logit Modulation

Policy Requirements

Upper‑Bound Bottlenecks in Detail

Memory System Control Layer (Policy)

Memory Unit Structure and Compression

Temporal × Policy Interactions

Temporal × Skill Management

Memory Tokens: Representation, Alignment, Governance

Architectural Summary (ASCII Diagram)

Key Takeaways

dbaplus Community

How this landed with the community

Was this worth your time?

0 Comments

System 1 + System 2 Design