A Comprehensive Review of Modern LLM Agent Memory Frameworks

The article surveys recent LLM‑based agent memory research, presenting a unified framework that breaks memory systems into four components, detailing their design choices, experimental evaluation on LOCOMO and LONGMEMEVAL, key findings, and a new low‑token SOTA architecture.

PaperAgent
PaperAgent
PaperAgent
A Comprehensive Review of Modern LLM Agent Memory Frameworks

1. Unified Framework: Placing Agent Memory on a Single Diagram

As large models such as GPT, Qwen and Claude become more capable, LLM‑based agents are moving from single‑turn QA to long‑term tasks like multi‑turn dialogue, personal assistants, and game agents. In these scenarios the agent must not only understand the current input but also continuously accumulate past interactions, preferences, factual changes, and task state.

A naïve solution is to place the entire interaction history into the prompt (naïve long‑context prompting), which suffers from context‑window overflow, high token cost, increased inference latency, and the model’s difficulty in locating truly relevant evidence.

The core goal of Agent Memory is therefore to avoid re‑reading the whole history each time and instead maintain a dedicated memory mechanism that can retrieve relevant information on demand to support reliable long‑term reasoning.

2. Four Core Components: What Exactly Makes Up Agent Memory?

Information Extraction: What to Store?

Extraction decides which content enters the memory system. Existing methods fall into three categories: direct archiving, summary‑style extraction, and graph‑based extraction.

Memory Management: How to Maintain Memory?

Management determines how new and old memories are merged, evolved, and forgotten. The paper summarizes this process into five operations: linking related experiences, consolidating fragmented memories, migrating across hierarchical levels, updating existing memories, and filtering out useless information.

Memory Storage: Where and How to Store?

Storage can be understood along two dimensions. In terms of organization, it is either flat (e.g., JSON, queues) or hierarchical (short‑term vs long‑term, different tree levels). In terms of representation, it is either vector‑based or graph‑based.

Information Retrieval: How to Fetch Relevant Memories?

Retrieval decides how the system finds the most useful information when a query arrives. The paper categorizes retrieval into four types:

Lexical matching (e.g., BM25, Jaccard) for exact entity, name, and keyword matches.

Vector retrieval using cosine similarity and ANN algorithms.

Structural retrieval that leverages explicit connections in graphs or trees, expanding via neighbors or traversals.

LLM‑assisted retrieval where the LLM participates in identifying key information or directly judging memory relevance.

3. Experiments: Unified Reproduction and Systematic Comparison

3.1 What Experiments Were Conducted?

Two datasets were used:

LOCOMO – a human long‑dialogue memory dataset covering single‑hop, multi‑hop, temporal reasoning, and open‑domain knowledge.

LONGMEMEVAL – a user‑AI long‑interaction memory dataset for evaluating information extraction, multi‑turn reasoning, knowledge update, and temporal inference.

On these datasets the authors uniformly re‑implemented and compared ten representative Agent Memory methods, measuring overall performance, token consumption, performance‑cost trade‑offs, context‑size scalability, evidence‑position sensitivity, and the impact of different underlying LLM sizes.

3.2 Main Results and Observations

Hierarchical or tree‑structured methods such as MemTree, MemoryOS, and MemOS performed best, showing that multi‑level structures can retain high‑level summaries while preserving low‑level evidence for complex long‑term tasks.

Processing an entire multi‑turn dialogue as a single unit dramatically reduces token usage, and a coarser granularity can even improve memory effectiveness.

When context size expands to roughly 200 % of the original, almost all methods experience performance degradation; approaches with clearer hierarchical management remain more stable.

Many methods are sensitive to evidence position: if crucial evidence appears early in the conversation, later turns can interfere and cause retrieval failures.

All memory architectures depend on the underlying LLM’s reasoning ability; scaling from Qwen2.5‑7B to 72B yields noticeable gains across most methods.

3.3 New SOTA Algorithm

Based on the findings, the authors combine the tree‑organizing strength of MemTree/MemOS with the hierarchical storage design of MemoryOS to create a new low‑token‑overhead Agent Memory framework that achieves state‑of‑the‑art performance.

lme-sota
lme-sota
cost sota
cost sota
Paper link: https://arxiv.org/abs/2604.01707
Code repository: https://github.com/Yanchen398/Memory-in-the-LLM-Era
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

memory-managementLLMInformation RetrievalevaluationAgent Memorylong-term tasksUnified Framework
PaperAgent
Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.