Artificial Intelligence 8 min read

Can AI Agents Master Long-Term Memory? Supermemory’s Near‑99% Accuracy Breakthrough

The Supermemory team’s new ASMR (Agentic Search and Memory Retrieval) system achieves almost 99% accuracy on the LongMemEval benchmark by replacing vector‑database retrieval with parallel, specialized AI agents that ingest, search, and synthesize massive conversational histories entirely in memory, offering a potential solution to longstanding AI memory challenges.

SuanNi

Mar 23, 2026

Can AI Agents Master Long-Term Memory? Supermemory’s Near‑99% Accuracy Breakthrough

Background and Motivation

Long‑term memory remains a core obstacle for artificial intelligence, especially when models must reason over multi‑turn dialogues, handle contradictory information, and incorporate updates over time. Traditional retrieval methods relying on vector databases and embeddings often struggle with noisy results and semantic similarity traps.

LongMemEval Benchmark

LongMemEval is designed to emulate real‑world complexity, containing over 115,000 tokens of dialogue history and requiring temporal reasoning. Existing memory systems typically falter on this benchmark due to inaccurate information extraction.

ASMR: Agentic Search and Memory Retrieval

The Supermemory team introduced ASMR, a novel architecture that eliminates the need for vector databases or embeddings. All operations run in memory, making integration into various systems—including robotics—straightforward.

Data Ingestion and Retrieval Pipeline

ASMR employs three parallel reading agents that observe raw conversation logs and extract targeted knowledge across six dimensions: personal information, preferences, events, temporal data, updates, and assistant details. Extracted structured data is stored in its native format and linked back to the source conversation.

Parallel Search Agents

When a query arrives, three specialized search agents operate concurrently, each with a distinct focus:

Agent 1 retrieves explicit facts and statements.

Agent 2 captures contextual cues, social signals, and implicit meanings.

Agent 3 reconstructs timelines and relationship graphs.

The orchestrator aggregates findings, extracts verbatim excerpts for verification, and performs intelligent retrieval based on actual comprehension rather than mere keyword similarity.

Answer Generation Pipelines

Two distinct pipelines were evaluated:

8‑Variant Ensemble : Retrieved context is routed to eight highly specialized prompt variants that run in parallel. If any variant produces the correct answer, the question is marked correct, yielding 98.60% accuracy.

12‑Variant Decision Forest : Twelve GPT‑4o‑mini‑based agents answer independently. An aggregator LLM then applies majority voting, domain trust scores, and conflict resolution to produce a single authoritative answer, achieving 97.20% accuracy.

Key Insights and Future Outlook

• Replacing vector search with active agentic retrieval avoids semantic similarity pitfalls and handles evolving information gracefully.

• Parallel processing across dedicated agents dramatically improves speed and precision while preventing information conflicts.

• Specialized agents outperform a single generic prompt, highlighting the advantage of task‑specific specialization.

Although ASMR is currently a sandbox prototype, the team plans to open‑source the full codebase and explore deployment in production environments. An open release is expected in early April.