Artificial Intelligence 14 min read

Eliminating Fragmented Memory with Mandol: An Open‑Source Lightweight In‑Memory Agent System

Mandol tackles the fragmented memory problem of LLM agents by unifying representation, storage, and retrieval in a memory‑native architecture; benchmarked on LoCoMo and LongMemEval it achieves up to 92.21% accuracy, 5× faster latency, and runs efficiently on consumer‑grade hardware without external databases.

Machine Heart

Jul 5, 2026

Eliminating Fragmented Memory with Mandol: An Open‑Source Lightweight In‑Memory Agent System

Problem Statement

LLM agents are expanding from single‑turn QA to long‑term, multi‑task collaboration in domains such as intelligent客服, personal assistants, and medical support. Their memory modules must store cross‑session, multi‑type information and answer complex queries with low latency and traceable evidence. Existing memory systems rely on heterogeneous combinations of vector databases, graph databases, and relational stores, leading to fragmented representations, high cross‑database query overhead, and noisy RAG‑style passive similarity matching that wastes token budget and yields unstable retrieval quality.

Mandol Overview

The Chinese Academy of Sciences Software Institute and collaborators propose Mandol , an agglomerative, memory‑native, hierarchical agent memory system. The core idea is to collapse fragmented memory representations and heterogeneous storage into a unified in‑memory architecture.

Paper: "Mandol: An Agglomerative Agent Memory System for Long‑Term Conversations" (arXiv:2606.29778). Project repository: https://github.com/AgentCombo/Mandol

Three Core Designs

1. Hierarchical Memory Model

Mandol organizes memory into two layers:

Base Memory Layer : Stores raw interaction data as memory units (containing original information and semantic vectors), memory spaces (providing logical isolation), and explicit relations (temporal, reference, state‑update) together with implicit semantic relations, forming a unified structured semantic graph.

High‑Order Abstract Memory Layer : A large model automatically extracts from the base layer event chains, entity relation graphs, and preference evolution chains, creating abstract knowledge while preserving links back to the original units.

Bidirectional links between layers ensure that any abstract inference can be traced to original dialogue evidence.

Example: The short utterance "booked a hut in a lane" becomes an event node with temporal and spatial context, linked to other travel events (e.g., "flight delayed", "visited the Forbidden City") and implicitly connected to a prior intent "plan to book a hotel in Wangfujing". A state‑update edge records the preference shift from "Wangfujing hotel" to "hut in a lane", enabling precise retrieval.

2. Memory‑Native Semantic Data Structure

To eliminate cross‑database latency, Mandol introduces a unified in‑memory storage architecture based on a semantic data structure, comprising SemanticMap and SemanticGraph :

SemanticMap merges key‑value storage with vector indexing, supporting multimodal memory units and context‑aware isolation via memory‑space tags.

SemanticGraph manages explicit edges directly in the graph and resolves implicit semantic edges on‑demand using the vector index in SemanticMap, avoiding pre‑enumeration of all possible semantic links.

Atomic mixed‑retrieval operators unify unit, space, relation, and multi‑hop queries, encapsulating vector matching and graph traversal as efficient in‑memory execution units. The active memory layer asynchronously pages to an embedded DuckDB backend for cold or long‑term data.

3. Intelligent Quantized Retrieval

Mandol redefines retrieval as "building high‑quality context within a limited token budget" and implements a quantized retrieval pipeline without large‑model involvement:

Adaptive routing allocates token budget and performs parallel recall from relevant high‑order and base memory sources based on query features.

Internal quantization denoises each source and resolves cross‑source conflicts, removing noise and redundancy.

Final context is compacted to satisfy the token budget while preserving relevance and diversity.

This process yields dense evidence contexts with controlled token consumption.

Experimental Evaluation

Mandol was evaluated on two long‑dialogue memory benchmarks, LoCoMo and LongMemEval, using GPT‑4.1‑mini as the answer generator and GPT‑4o‑mini as the evaluator.

Overall accuracy: 92.21% on LoCoMo and 88.40% on LongMemEval, the highest among representative open‑source memory systems.

For complex query types such as multi‑hop reasoning, temporal reasoning, and knowledge updates, Mandol showed a clear advantage.

Using lighter retrieval back‑ends (Qwen3‑Embedding‑0.6B and bge‑reranker‑v2‑m3) still outperformed larger‑model baselines while reducing token consumption by 17.4%–20.0%.

Performance under load (10 QPS) on a server with an NVIDIA H800 GPU:

Average retrieval latency: 82.2 ms (≈5.4× faster than the fastest baseline).

Average insertion latency: 39.7 ms (≈4.8× faster than the fastest baseline).

On a consumer‑grade laptop (NVIDIA RTX 5090), latency remained lower than existing systems, demonstrating strong edge deployment potential. Memory usage is moderate, and eliminating external database communication reduces total processing time to 1/4.2–1/9.9 of competing systems.

Conclusion

Mandol’s three innovations—hierarchical memory modeling, memory‑native unified storage, and intelligent quantized retrieval—deliver high accuracy, low latency, and lightweight deployment for agents that require reliable long‑term memory. The open‑source release enables researchers and engineers to reproduce, experiment with, and extend the system for dialogue, recommendation, or companion agents.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM Open-source Agent Memory Semantic Retrieval Hierarchical Memory

Written by

Machine Heart

Professional AI media and industry service platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.