Artificial Intelligence 19 min read

Three Forms of Large Model Memory – Parameter, Token, and Latent – Why Top Companies Are All‑In

A new paper unifies AI memory research with a three‑dimensional framework (Forms, Functions, Dynamics), classifies memory as parameter‑level, token‑level, or latent‑space, and evaluates real‑world implementations from OpenAI, Google, Amazon and dozens of open‑source frameworks, highlighting trade‑offs such as retrieval quality, catastrophic forgetting and forgetting mechanisms.

DataFunSummit

Jun 24, 2026

Three Forms of Large Model Memory – Parameter, Token, and Latent – Why Top Companies Are All‑In

On December 15, 2025, research teams from Stanford, Fudan, Oxford and others released the paper Memory in the Age of AI Agents , which for the first time provides a unified theoretical framework for the chaotic field of AI memory systems, integrating fragmented prior work into a comprehensive classification.

Three‑Dimensional Classification

The paper proposes a three‑axis taxonomy:

Forms – how memory is stored: Token‑level, Parameter‑level, or Latent‑space.

Functions – the cognitive role of memory: Fact, Experience, or Working memory.

Dynamics – the lifecycle of memory: Formation, Evolution, and Retrieval.

Form Dimension

Token‑level memory is symbolic, addressable and transparent, suitable for scenarios that require high interpretability and frequent updates. The PPIO team’s "nine‑type" token memory (chat history, user profiles, knowledge graph, etc.) exemplifies this design, and the Letta framework (formerly MemGPT) introduces a "virtual memory paging" mechanism that splits memory into a Core Memory kept in the context window and a Recall Storage accessed on demand, thereby extending effective context length.

The paper notes that token‑level memory’s strengths are transparency and controllability, while its challenges lie in retrieval quality and scaling. Zep’s Graphiti project addresses the retrieval challenge by organizing memory as a knowledge graph, improving precision.

Parameter‑level Memory

Parameter‑level memory encodes information directly into model weights through training or fine‑tuning, offering abstract, generalized knowledge at the cost of slow updates. Google DeepMind’s ReMem framework exemplifies this approach: it uses reinforcement learning to distill experience into weights and mitigates catastrophic forgetting with incremental learning and experience replay.

OpenAI’s Build Hour demonstration shows a hybrid "parameter + token" architecture, where proprietary knowledge is baked into the model via LoRA adapters while factual data remains in an external store, illustrating the paper’s "external parameter memory" concept.

Latent‑space Memory

Latent‑space memory stores information in hidden states or KV caches, invisible to humans but highly efficient for multimodal and edge scenarios. The 2024 MIRIX (Modular Multimodal Architecture) paper implements three sub‑types—generation, reuse, and transformation—matching the latent‑space categories described in the survey. A‑MEM applies a Zettelkasten‑style neural association layer, turning explicit token notes into latent connections, creating a hybrid memory.

Function Dimension

The paper’s functional axis classifies memory as:

Fact memory – maintains consistency (what the agent knows).

Experience memory – enables self‑improvement (what the agent has learned).

Working memory – supports the current task (what the agent is thinking).

This contrasts with the classic psychological taxonomy (episodic, semantic, procedural) and aligns better with engineering needs. Amazon’s Agentic AI stack exemplifies the functional split: Amazon Bedrock Knowledge Base for fact memory, execution‑trajectory database for experience memory, and session context for working memory.

Microsoft Copilot relies on parameter‑level memory to turn "write‑comment‑code" into an innate ability, sacrificing interpretability for speed, while Letta implements token‑level experience memory via tool functions.

Dynamic Dimension

The dynamic axis covers the full lifecycle of memory. The paper identifies five formation techniques—semantic summarization, knowledge distillation, structured construction, latent‑space representation, and parameter internalization. PPIO’s structured construction (organizing dialogue flow, user profiles, knowledge graphs) and Google Cloud’s context‑engineering (semantic summarization after each turn) illustrate two of these methods.

ReMem combines knowledge distillation and parameter internalization, showing that agents can transfer cross‑task knowledge after reinforcement‑learning‑based policy extraction. The paper also stresses the importance of forgetting: a good system must both remember and forget.

Forgetting Mechanisms

MemoryScope (Alibaba DAMO) implements a four‑layer biomimetic architecture with a consolidation rule that promotes frequently accessed short‑term items to long‑term storage and demotes idle long‑term items, matching the paper’s "access‑frequency‑based forgetting" algorithm. Letta’s simpler approach archives the oldest records when storage exceeds a threshold, but the paper warns that time‑only decay can delete important infrequent items; a combined time‑frequency‑importance strategy is recommended.

Retrieval Strategies

The paper critiques the over‑reliance on pure vector similarity. Zep’s Graphiti adds knowledge‑graph traversal, while Cognee fuses vector, graph, and full‑text search into a hybrid retriever. Google Cloud’s best practice generates a structured "retrieval intent" with an LLM before searching, markedly improving precision in multi‑turn dialogs.

Decision‑Tree for Framework Selection

Based on the taxonomy, the authors propose a practical decision tree:

Choose the memory form: Token‑level for transparency (e.g., Mem0, Zep, Letta); Parameter‑level for performance‑critical black‑box use (e.g., DeepMind ReMem); Latent‑space for multimodal or edge deployments (e.g., MIRIX).

Identify functional needs: Fact memory → vector DB or knowledge graph; Experience memory → reinforcement‑learning‑based internalization or A‑MEM; Working memory → Amazon’s five‑step cycle or Letta’s virtual paging.

Select a concrete framework: Mem0 for quick API start, Zep for graph‑enhanced retrieval, Letta for virtual memory, ReMem for parameter‑level, MIRIX for multimodal latent memory, MemoryScope for Chinese‑language support.

The paper notes that open‑source solutions for parameter‑level and latent‑space memory remain scarce, and that these approaches have higher technical barriers and lower generality.

Future Directions

The final chapter outlines five research avenues: automated memory design (agents decide what/when to store or forget), reinforcement‑learning‑driven memory architectures, multimodal latent‑space memory, shared memory for multi‑agent collaboration, and trustworthy memory (privacy, security, hallucination mitigation). The accompanying ebook documents early explorations for each direction.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Large Language Models framework comparison Agent architecture AI memory parameter-level memory token-level memory latent memory

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.