Can AI Agents Master Long-Term Memory? Supermemory’s Near‑99% Accuracy Breakthrough
The Supermemory team’s new ASMR (Agentic Search and Memory Retrieval) system achieves almost 99% accuracy on the LongMemEval benchmark by replacing vector‑database retrieval with parallel, specialized AI agents that ingest, search, and synthesize massive conversational histories entirely in memory, offering a potential solution to longstanding AI memory challenges.
Background and Motivation
Long‑term memory remains a core obstacle for artificial intelligence, especially when models must reason over multi‑turn dialogues, handle contradictory information, and incorporate updates over time. Traditional retrieval methods relying on vector databases and embeddings often struggle with noisy results and semantic similarity traps.
LongMemEval Benchmark
LongMemEval is designed to emulate real‑world complexity, containing over 115,000 tokens of dialogue history and requiring temporal reasoning. Existing memory systems typically falter on this benchmark due to inaccurate information extraction.
ASMR: Agentic Search and Memory Retrieval
The Supermemory team introduced ASMR, a novel architecture that eliminates the need for vector databases or embeddings. All operations run in memory, making integration into various systems—including robotics—straightforward.
Data Ingestion and Retrieval Pipeline
ASMR employs three parallel reading agents that observe raw conversation logs and extract targeted knowledge across six dimensions: personal information, preferences, events, temporal data, updates, and assistant details. Extracted structured data is stored in its native format and linked back to the source conversation.
Parallel Search Agents
When a query arrives, three specialized search agents operate concurrently, each with a distinct focus:
Agent 1 retrieves explicit facts and statements.
Agent 2 captures contextual cues, social signals, and implicit meanings.
Agent 3 reconstructs timelines and relationship graphs.
The orchestrator aggregates findings, extracts verbatim excerpts for verification, and performs intelligent retrieval based on actual comprehension rather than mere keyword similarity.
Answer Generation Pipelines
Two distinct pipelines were evaluated:
8‑Variant Ensemble : Retrieved context is routed to eight highly specialized prompt variants that run in parallel. If any variant produces the correct answer, the question is marked correct, yielding 98.60% accuracy.
12‑Variant Decision Forest : Twelve GPT‑4o‑mini‑based agents answer independently. An aggregator LLM then applies majority voting, domain trust scores, and conflict resolution to produce a single authoritative answer, achieving 97.20% accuracy.
Key Insights and Future Outlook
• Replacing vector search with active agentic retrieval avoids semantic similarity pitfalls and handles evolving information gracefully.
• Parallel processing across dedicated agents dramatically improves speed and precision while preventing information conflicts.
• Specialized agents outperform a single generic prompt, highlighting the advantage of task‑specific specialization.
Although ASMR is currently a sandbox prototype, the team plans to open‑source the full codebase and explore deployment in production environments. An open release is expected in early April.
Open Challenges
• Latency from multiple LLM calls remains a concern.
• Scaling to million‑token contexts requires further validation.
• Determining optimal upstream data storage for ingestion quality is still an open problem.
References
https://x.com/DhravyaShah/status/2035517012647272689
https://github.com/supermemoryai/supermemory
SuanNi
A community for AI developers that aggregates large-model development services, models, and compute power.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
