Teaching LLMs to Manage Memory Autonomously, Dropping Manual Rules
Alibaba's new AgeMem framework turns long‑term and short‑term memory management for large language model agents into a learnable reinforcement‑learning task, replacing handcrafted rules with a three‑stage training process and achieving significant benchmark gains.
Memory Management Challenges
Existing large‑language‑model (LLM) agents separate long‑term memory (LTM) and short‑term memory (STM) and rely on fixed, manually designed rules. Trigger‑based systems (e.g., LangMem, Mem0) store at preset times, while expert‑model approaches (e.g., A‑Mem) add auxiliary complexity. The separation leads to information loss, duplicate storage, and inability to prioritize memories intelligently.
AgeMem Framework
AgeMem unifies LTM and STM management and lets the agent learn when to store, update, delete, retrieve, summarize, and filter information. All decisions are produced by a three‑phase reinforcement‑learning (RL) strategy rather than hard‑coded policies.
Memory Operation Tools
Add : store new knowledge
Update : modify existing memories
Delete : remove stale information
Retrieve : fetch relevant LTM entries
Summary : compress dialogue history
Filter : discard irrelevant content
Three‑Stage Training Process
Stage 1 – LTM Construction : The model learns to identify information worth long‑term storage, analogous to taking notes.
Stage 2 – STM Control : In noisy environments the model learns to filter out irrelevant inputs, similar to focusing in a crowded room.
Stage 3 – Integrated Reasoning : The agent combines LTM and STM to solve tasks, like using notes together with on‑the‑spot thinking during an exam.
Step‑wise Group Relative Policy Optimization (GRPO)
AgeMem trains with Step‑wise GRPO, which back‑propagates rewards to every memory‑related decision along a trajectory instead of only at episode end. For each task the system:
Generates multiple solution paths (e.g., eight).
Ranks the paths relative to each other.
Marks the best path as a positive sample.
Broadcasts advantage to every step of the best path, rewarding early memory actions such as storing or retrieving information.
This “reward‑backtrack” mechanism enables the model to learn correct memory operations long before the final task outcome.
Empirical Results
On the ToolBench benchmark, the DeepMiner‑32B model equipped with AgeMem handled more than 100 tool calls with 33.5 % accuracy.
Across five benchmarks, AgeMem improved performance over a no‑memory baseline by 49.59 % for Qwen2.5‑7B and 23.52 % for Qwen3‑4B, and outperformed the strongest baselines by 4.82–8.57 percentage points.
Memory‑quality scores on HotpotQA rose to 0.533 and 0.605, far exceeding competing methods.
Tool‑use behavior became more proactive: Add operations increased from 0.92 to 1.64 per task, Update from near 0 to 0.13, and Filter from 0.02 to 0.31.
Case Studies
Case 1 – Long‑Term Memory Construction : When a user changes learning preferences, AgeMem updates stored entries to avoid redundancy.
Case 2 – Short‑Term Memory under Interference : In noisy contexts, AgeMem filters out irrelevant data, keeping the task focused.
Case 3 – Integrated Task Execution : AgeMem coordinates LTM retrieval and STM processing to generate personalized responses.
Key Takeaways
Memory operations shift from rule‑driven to learning‑driven policies.
Long‑term and short‑term memory are managed within a unified framework.
The agent proactively selects, updates, and discards information.
Paper: https://arxiv.org/abs/2601.01885
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AI Engineering
Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
