MIA: Memory Agent Framework That Ends Forgetful Work and Drives Continuous Evolution
Introducing Memory Intelligence Agent (MIA), a novel AI framework that combines a Planner‑Executor‑Manager architecture with dual parametric and non‑parametric memories, enabling agents to retain experience, continuously evolve through alternating reinforcement learning and test‑time learning, and achieve SOTA performance on multimodal and text research tasks.
Most existing agents operate in a "forgetful" mode: each retrieval starts from scratch, reasoning paths are not persisted, and failures do not become experience. This limits their ability to grow stronger in deep research scenarios.
Never memorize something that you can look up. — Albert Einstein
To address this, the Shanghai Chuangzhi Academy and East China Normal University team propose Memory Intelligence Agent (MIA), a next‑generation memory‑agent framework designed for deep research.
The paper (https://arxiv.org/abs/2604.04503) and code repository (https://github.com/ECNU-SII/MIA) are publicly available.
Architecture
MIA builds a Planner–Executor–Manager memory system:
Planner : a tactical brain that creates research plans for the current problem and continuously adjusts its strategy through test‑time learning.
Executor : a trained execution expert that can faithfully follow complex research blueprints.
Manager : an ultimate administrator that optimizes memory storage to eliminate redundancy.
Core Innovations
Dual‑memory mechanism : non‑parametric memory stores experience, while parametric memory absorbs capability; the two transform each other, forming a closed loop of continual evolution.
Manager‑Planner‑Executor multi‑agent structure : decouples memory management, planning, and execution, and drives coordinated evolution of Planner and Executor via alternating reinforcement learning.
Open‑world self‑evolution : combines reflection and unsupervised learning so the agent can continuously refine its strategy and update memory during open‑world reasoning.
Planner‑Executor Alignment
Traditional systems often stitch Planner and Executor together without true collaboration. MIA introduces a two‑stage alternating reinforcement learning process followed by test‑time continual learning:
Stage 1 : freeze Planner, let Executor learn to understand and strictly follow the plan.
Stage 2 : freeze Executor, let Planner learn to generate better plans using memory and to re‑plan when execution fails.
This "align execution first, then optimize decision" approach solves the mismatch where planning is strong but execution lags.
During inference, MIA generates multiple candidate reasoning paths, extracts non‑parametric memory from successful and failed trajectories, and updates parametric memory online, creating an almost synchronous inference‑training loop.
Open‑World Evaluation Mechanism
To enable continual improvement without external feedback, MIA replaces result‑based labels with a "process quality" signal. The system evaluates reasoning rigor, evidence reliability, and conclusion soundness, even when no ground‑truth answer exists.
Inspired by academic peer review, the evaluation splits into three expert perspectives:
Logic reviewer : checks the coherence of the reasoning chain.
Fact reviewer : verifies source information and guards against hallucinations.
Result reviewer : assesses whether the task was truly completed.
A "domain chair" aggregates these judgments to provide a stable optimization signal for the agent.
Experimental Results
Across multiple text and multimodal deep‑research tasks, MIA markedly improves stability and efficiency.
SOTA breakthroughs (a & b) : on LiveVQA (multimodal online search) and HotpotQA (textual sandbox search), MIA significantly outperforms leading LLMs (GPT‑5.4, Gemini‑3‑Flash, Claude‑Sonnet‑4.6) when tool‑calling is enabled.
Cross‑scale superiority (c) : using a Qwen‑2.5‑VL‑7B executor, MIA surpasses GPT‑5.4, GPT‑4o, and Gemini‑2.5‑Pro without tool use, approaching Gemini‑3‑Flash performance.
New benchmark for memory methods (d) : on seven datasets, MIA achieves the best scores compared with current advanced agent memory approaches.
Conclusion
Agent memory should not merely store "what the result is" but should teach the agent "how to act". MIA demonstrates that an agent's upper bound is determined not by the number of external tools it can access, but by its ability to compress complex process information into refined execution instincts during each interaction with the world.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
