Artificial Intelligence 9 min read

BudgetMem: A Budget Router for Runtime Agent Memory Enables Cost‑Aware Query Processing

BudgetMem introduces a query‑aware budget‑tier routing mechanism for LLM agents, allowing the memory system to dynamically allocate computational resources based on query complexity and achieving a superior performance‑cost trade‑off on several benchmarks.

Machine Heart

Jun 14, 2026

BudgetMem: A Budget Router for Runtime Agent Memory Enables Cost‑Aware Query Processing

Background

Many existing agent memory systems follow a fixed "build once, use always" pipeline: memories are constructed offline and later retrieved uniformly for all queries. This approach is query‑agnostic, potentially discarding details needed for future queries, and lacks explicit performance‑cost control, causing simple queries to waste resources and complex queries to suffer from insufficient budget.

Runtime Query‑Aware Memory Extraction

BudgetMem replaces the static pipeline with a runtime, query‑aware extraction process. The full history is kept as raw chunks; when a query arrives, relevant chunks are filtered and passed through a modular pipeline that extracts entities, temporal information, and topics before summarizing into a query‑focused memory.

Filtering → Entity / Temporal / Topic Extraction → Summarization

Each module can operate at three budget tiers—LOW, MID, HIGH—allowing the same module to be executed with varying computational cost and quality.

Budget‑Tier Strategies

Three orthogonal tiering strategies are explored:

Implementation Tiering : switch from rule‑based methods to lightweight models and finally to LLM‑based modules.

Reasoning Tiering : change reasoning depth from direct extraction to chain‑of‑thought style and then to multi‑step or reflection‑style processing.

Capacity Tiering : vary the model size used for a module.

These axes let BudgetMem study the performance‑cost trade‑off systematically rather than applying a single token‑saving trick.

Reinforcement‑Learning Budget Router

The system includes a lightweight Budget Router that selects the appropriate tier for each module during runtime. Because the memory extraction pipeline contains discrete retrieval, rule‑based, small‑model, and LLM calls, the routing problem is modeled as a sequential decision process and trained with reinforcement learning.

Each query processing episode receives a task reward based on answer quality and a cost reward reflecting memory extraction expense. By adjusting the cost weight, BudgetMem can shift between a performance‑first mode (prioritizing answer quality) and a cost‑sensitive mode (reducing computational expense).

Experimental Results

BudgetMem is evaluated on LoCoMo, LongMemEval, and HotpotQA, comparing against strong baselines such as ReadAgent, MemoryBank, A‑MEM, Mem0, MemoryOS, and LightMem. In the performance‑first setting, BudgetMem achieves higher overall F1 scores and better LLM‑Judge ratings than the baselines.

When the cost weight is varied, BudgetMem produces a smooth, controllable performance‑cost frontier: for a given cost it yields better effectiveness, and for a given effectiveness it reduces memory extraction cost.

Further analysis shows that Implementation and Capacity Tiering cover a wider budget range suitable for deployments from low‑cost to high‑performance, while Reasoning Tiering acts as a fine‑grained quality knob within a narrow cost band.

Conclusion

The core insight of BudgetMem is that future agent memories should not follow a fixed store‑retrieve‑compress routine; instead, they must allocate computation on‑demand according to the current query’s demands. Simple queries can follow low‑cost paths for fast responses, whereas complex queries can invoke higher‑tier modules, deeper reasoning, or larger models to obtain reliable memory support. This dynamic, cost‑aware approach is positioned as a foundational capability for long‑term dialogue, personalized agents, and real‑world deployments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM Reinforcement Learning Agent Memory Memory Retrieval Budget Routing Performance‑Cost Tradeoff

Written by

Machine Heart

Professional AI media and industry service platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.