From Single to Multi-Granularity: How MemGAS Keeps Conversational Agents from Forgetting
The ICLR 2026 paper introduces MemGAS, a multi‑granularity memory framework that organizes dialogue history into session, turn, summary, and keyword levels, dynamically associates them with a Gaussian‑Mixture Model, selects the optimal granularity via entropy‑based routing, and achieves state‑of‑the‑art retrieval and QA performance on long‑term conversational benchmarks.
Large language models (LLMs) have become the backbone of conversational agents, yet their fixed context windows prevent them from preserving extensive interaction histories, leading to incomplete or noisy responses in long‑term dialogues.
Existing retrieval‑augmented memory systems typically rely on a single granularity—such as session‑level or turn‑level—causing two major problems: (1) only partial relevant information is retrieved, missing key context; (2) irrelevant or redundant content pollutes the retrieved set, confusing the generator.
To address these issues, the authors propose MemGAS (Memory with Multi‑Granularity Association and Selection), a two‑stage framework. In the first stage, each conversation is encoded into four memory units: session‑level (full dialogue), turn‑level (individual turns), summary‑level (LLM‑generated summaries), and keyword‑level (extracted keywords). These units are projected into dense vectors and clustered with a Gaussian‑Mixture Model (GMM). The GMM partitions pairwise similarities into an Accept Set (highly related memories) and a Reject Set (irrelevant memories), mimicking human memory consolidation. Proposition 1 in the paper proves that, under reasonable distribution assumptions, the error‑association rate of the GMM decays exponentially.
In the second stage, an entropy‑based router computes the similarity distribution between a query and each granularity, using Shannon entropy to measure confidence. Low‑entropy (high‑confidence) granularity receives larger weight via inverse‑entropy scaling, enabling adaptive selection without manual tuning. The weighted granularity graph is then processed by Personalized PageRank (PPR) to propagate relevance scores, allowing the system to surface memories that are semantically related but have low embedding similarity. Finally, the top‑K retrieved memories undergo LLM‑based redundancy filtering to discard duplicates and unrelated content before being fed to the generator.
The framework is evaluated on four long‑term memory benchmarks—LoCoMo, Long‑MT‑Bench+, LongMemEval‑s, and LongMemEval‑m—against strong baselines such as Full History, MPNet, Contriever, HippoRAG 2, and RAPTOR. MemGAS consistently outperforms all baselines. For example, on LongMemEval‑s it achieves an F1 score of 20.38 (38.4 % higher than the best baseline HippoRAG 2 at 14.73) and a GPT‑4o‑Judge rating of 60.20 . Retrieval metrics also improve, with Recall@3 reaching 78.51 and NDCG@3 reaching 86.83 .
Ablation studies show that removing any of the four core modules—GMM association, PPR graph retrieval, the multi‑granularity association layer, or the entropy router—degrades performance. Eliminating all modules drops F1 to 13.78 and Recall@3 to 71.06, confirming each component’s necessity. The added computational overhead is minimal (≤ 0.0191 s), with LLM API calls accounting for over 98 % of end‑to‑end latency.
Further analysis compares query types, top‑K settings, and single‑ versus multi‑granularity approaches. MemGAS excels on multi‑session and multi‑hop queries, and its entropy‑driven routing maintains high performance across varying top‑K values by effectively filtering noise. Compared with single‑granularity methods and a naïve multi‑granularity concatenation (Combination), MemGAS achieves superior scores on all datasets, e.g., F1 = 20.38 vs. 14.59 (Combination) and 14.94 (best single granularity Turn‑level) on LongMemEval‑s.
Error analysis partitions outcomes into four quadrants (retrieval correct/incorrect × generation correct/incorrect). On LongMemEval‑s, 56 % of cases fall into the “retrieval correct + generation correct” quadrant, demonstrating that MemGAS reliably identifies relevant context and avoids hallucinations.
In summary, the paper contributes (1) a novel multi‑granularity association framework that leverages GMM clustering to build dynamic cross‑granular memory links, (2) an entropy‑based adaptive router that balances information completeness and noise suppression, and (3) extensive empirical evidence showing significant gains in both retrieval and generation tasks for long‑term conversational agents.
Machine Learning Algorithms & Natural Language Processing
Focused on frontier AI technologies, empowering AI researchers' progress.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
