Teaching LLMs to Manage Memory Autonomously, Dropping Manual Rules

Alibaba's new AgeMem framework turns long‑term and short‑term memory management for large language model agents into a learnable reinforcement‑learning task, replacing handcrafted rules with a three‑stage training process and achieving significant benchmark gains.

AI Engineering
AI Engineering
AI Engineering
Teaching LLMs to Manage Memory Autonomously, Dropping Manual Rules

Memory Management Challenges

Existing large‑language‑model (LLM) agents separate long‑term memory (LTM) and short‑term memory (STM) and rely on fixed, manually designed rules. Trigger‑based systems (e.g., LangMem, Mem0) store at preset times, while expert‑model approaches (e.g., A‑Mem) add auxiliary complexity. The separation leads to information loss, duplicate storage, and inability to prioritize memories intelligently.

AgeMem Framework

AgeMem unifies LTM and STM management and lets the agent learn when to store, update, delete, retrieve, summarize, and filter information. All decisions are produced by a three‑phase reinforcement‑learning (RL) strategy rather than hard‑coded policies.

Memory Operation Tools

Add : store new knowledge

Update : modify existing memories

Delete : remove stale information

Retrieve : fetch relevant LTM entries

Summary : compress dialogue history

Filter : discard irrelevant content

Three‑Stage Training Process

Stage 1 – LTM Construction : The model learns to identify information worth long‑term storage, analogous to taking notes.

Stage 2 – STM Control : In noisy environments the model learns to filter out irrelevant inputs, similar to focusing in a crowded room.

Stage 3 – Integrated Reasoning : The agent combines LTM and STM to solve tasks, like using notes together with on‑the‑spot thinking during an exam.

Step‑wise Group Relative Policy Optimization (GRPO)

AgeMem trains with Step‑wise GRPO, which back‑propagates rewards to every memory‑related decision along a trajectory instead of only at episode end. For each task the system:

Generates multiple solution paths (e.g., eight).

Ranks the paths relative to each other.

Marks the best path as a positive sample.

Broadcasts advantage to every step of the best path, rewarding early memory actions such as storing or retrieving information.

This “reward‑backtrack” mechanism enables the model to learn correct memory operations long before the final task outcome.

Empirical Results

On the ToolBench benchmark, the DeepMiner‑32B model equipped with AgeMem handled more than 100 tool calls with 33.5 % accuracy.

Across five benchmarks, AgeMem improved performance over a no‑memory baseline by 49.59 % for Qwen2.5‑7B and 23.52 % for Qwen3‑4B, and outperformed the strongest baselines by 4.82–8.57 percentage points.

Memory‑quality scores on HotpotQA rose to 0.533 and 0.605, far exceeding competing methods.

Tool‑use behavior became more proactive: Add operations increased from 0.92 to 1.64 per task, Update from near 0 to 0.13, and Filter from 0.02 to 0.31.

Case Studies

Case 1 – Long‑Term Memory Construction : When a user changes learning preferences, AgeMem updates stored entries to avoid redundancy.

Case 2 – Short‑Term Memory under Interference : In noisy contexts, AgeMem filters out irrelevant data, keeping the task focused.

Case 3 – Integrated Task Execution : AgeMem coordinates LTM retrieval and STM processing to generate personalized responses.

Key Takeaways

Memory operations shift from rule‑driven to learning‑driven policies.

Long‑term and short‑term memory are managed within a unified framework.

The agent proactively selects, updates, and discards information.

Paper: https://arxiv.org/abs/2601.01885

Diagram 1
Diagram 1
Diagram 2
Diagram 2
Diagram 3
Diagram 3
Diagram 4
Diagram 4
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Memory ManagementLLMbenchmarkreinforcement learningGRPOTool UseAgeMem
AI Engineering
Written by

AI Engineering

Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.