Why Memory Is the Next Critical Infrastructure for AI Agents
This survey reviews over 200 papers to propose a three‑dimensional classification framework for foundation‑agent memory, analyzes paradigm shifts from model‑centric to utility‑centric AI, and outlines memory substrates, cognitive mechanisms, operation strategies, learning paradigms, evaluation metrics, applications, and future research directions.
AI Enters the "Second Half" – Memory Becomes a Core Infrastructure
The authors argue that AI research is undergoing a paradigm shift: the first half focused on model architecture innovation and benchmark scores, while the second half emphasizes problem definition, real‑world evaluation, and long‑term, dynamic, user‑dependent utility.
"Memory emerges as the critical solution to fill the utility gap." – Memory is now the bridge between ideal benchmarks and practical applications.
Three‑Dimensional Unified Classification Framework
The survey proposes a framework that examines agent memory from three complementary dimensions:
1. Memory Substrate – How Information Is Stored
Type Definition Typical Implementations Pros / Cons
Internal Stored in model weights, states, or KV cache Parameterized knowledge, latent state, KV cache Fast access, tight integration; costly updates, catastrophic forgetting
External Stored in vector indexes or structured stores Vector DBs, knowledge graphs, text logs Scalable, easy to update; retrieval latency, possible noise2. Cognitive Mechanism – How Memory Is Used
Memory Type Function Research Trend
Sensory Short‑term retention for attention Rapid growth (2025, multimodal/embodied)
Working Temporary storage for task‑relevant data Core research focus
Episodic Stores specific experiences (time, place) Explosive growth (2025)
Semantic Stores abstract knowledge and facts Steady growth
Procedural Stores skills and operation flows Emerging hotspot3. Memory Subject – Who Benefits
User‑Centric Memory : Stores user preferences, interaction history, personalization. Challenges: dialogue memory management, long‑term personalization, privacy.
Agent‑Centric Memory : Stores the agent’s accumulated knowledge and skills. Challenges: long‑term task execution, domain‑specific solutions, cross‑task knowledge transfer.
Memory Operation Mechanisms: From Single to Multi‑Agent Systems
Single‑agent systems perform five core operations:
Store & Index : Organize information in vector, structured, or text formats for efficient retrieval.
Load & Retrieve : Filter and rank relevant memories, inject them into the current context.
Update & Refresh : Dynamically revise memory entries to incorporate new information.
Compress & Summarize : Collapse detailed interaction histories into compact abstractions to control memory growth.
Forget & Retain : Remove outdated data while preserving high‑value knowledge.
Multi‑agent systems face additional architectural challenges, ranging from fully private memories to shared workspaces, hybrid private‑plus‑shared layers, and orchestrated (central‑controller) designs. Representative works include RecAgent, TradingGPT (private), MetaGPT, InteRecAgent (shared), Collaborative Memory, MirrorMind (hybrid), and ChatDev, MIRIX (orchestrated).
Memory Learning Strategies: From Prompting to Reinforcement Learning
Level 1 – Prompt‑Based Learning
Static Prompts : Pre‑defined rules such as hierarchical memory management in MemGPT.
Dynamic Prompts : Adjusted at inference time based on feedback, e.g., self‑reflection in Reflexion.
Level 2 – Fine‑Tuning Parameterized Strategies
Internalize memory behavior into model parameters.
Key challenges: strategy stabilization, boundary control, retrieval optimization.
Level 3 – Reinforcement Learning
Step‑Level Decisions : Learn when to store, update, or delete (e.g., Memory‑R1).
Trajectory‑Level Representations : Learn compression and summarization strategies (e.g., MemSearcher).
Cross‑Episode Memory : Accumulate reusable policies for continual learning.
Evaluation System – Beyond Accuracy
The survey categorizes evaluation metrics into three groups (shown in Figure 2) and highlights that current benchmarks mainly test static recall. Future evaluation should measure dynamic adaptation, preference drift, and safety boundaries.
Application Scenarios – 12 Major Domains Empowered by Memory
Education, scientific research, gaming & simulation, robotics, healthcare, dialogue systems, workflow automation, software engineering, information flow & recommendation, information retrieval, finance & accounting, legal consulting.
Six Future Directions
Memory ≠ storage – modern agent memory is an active cognitive architecture involving selection, compression, forgetting, and reasoning.
Context explosion drives memory design as tasks shift from single‑turn QA to long‑term interaction.
Learning memory management itself via RL will replace hand‑crafted heuristics.
Evaluation must evolve to assess dynamic adaptation, preference drift, and safety.
Hybrid architectures dominate: combine fast internal memory with scalable external stores.
Integrate memory as a core infrastructure for reliable, efficient, personalized AI agents.
Key Takeaway
This comprehensive review of 200+ papers provides a unified roadmap for understanding and building memory systems in foundation agents, positioning memory as an indispensable infrastructure for future AI agents.
https://arxiv.org/pdf/2602.06052How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
