How Efficient Agents Are Redefining Memory, Tool Learning, and Planning in 2026
A joint survey by nine leading Chinese institutions outlines the efficiency crisis of modern AI agents and proposes three strategic directions—efficient memory, tool learning, and planning—detailing concrete mechanisms, representative models, and emerging trends for building high‑performing, low‑cost agents.
Efficiency Crisis of Agents
Current agent architectures suffer from an input‑solution loop : each step’s output becomes the next step’s input, leading to cumulative token usage, high inference latency, and slow response times.
Three strategic directions are proposed to mitigate this crisis:
Efficient Memory
Efficient Tool Learning
Efficient Planning
1. Efficient Memory
Figure 2: Memory lifecycle – construction, management, and access.
1.1 Working Memory (text‑based)
COMEDY : LLM extracts session‑specific facts and compresses them into key events, user profiles, and relationship changes.
MemAgent / MEM1 : Processes long inputs sequentially, rewriting a compact memory state at each step.
AgentFold : Actively folds interaction history into multi‑scale summaries plus the latest full turn.
1.2 Implicit Working Memory
Activation Beacon : Stores context as continuous signals by distilling layer‑wise KV activations into compact beacons.
MemoryLLM : Maintains a fixed‑size token pool that self‑updates to reuse implicit knowledge.
Titans : Updates a neural memory module during inference, writing only when prediction error is high.
1.3 External Memory
MemoryBank : Applies the Ebbinghaus forgetting curve to decay stale memories while reinforcing important items.
Memory‑R1 / Mem0 : Extracts dialogue snippets, summarizes them into candidate memories, and supports CRUD operations.
A‑MEM : Converts interactions into atomic notes with context, keywords, and tags.
1.4 Graph‑Structured Memory
GraphReader : Segments long text into key elements and atomic facts, building a graph that captures long‑range dependencies.
AriGraph : Unified semantic‑scene memory graph; semantic triples update the semantic graph while scene nodes link the two.
Zep : Constructs a temporally aware knowledge graph, extracting and aligning entity relations and storing facts with expiration.
1.5 Hierarchical Memory
MemGPT : OS‑style virtual memory paging that partitions prompts into system instructions, writable working context, and FIFO message buffers.
MemoryOS : Three‑layer storage (short‑term dialogue pages, mid‑term topic segments, long‑term personal profiles).
LightMem : Perception‑STM‑LTM pipeline that pre‑compresses input, performs online soft updates, and offline consolidation.
1.6 Multi‑Agent Memory
Shared Memory : Centralized reusable information reduces redundancy. Representative methods: MS, G‑Memory, RCR‑Router, MIRIX.
Local Memory : Each agent stores information independently, yielding lightweight and low‑noise memory. Representative methods: Intrinsic Memory Agents, AgentNet, DAMCS.
Hybrid Memory : Combines shared and local memory with coordinated routing. Representative methods: SRMT, Collaborative Memory, LEGOMem.
2. Efficient Tool Learning
Figure: Classification of tool‑learning paradigms.
2.1 Tool Selection Paradigms
External Retriever : An independent model embeds queries and tool descriptions, then computes similarity (e.g., ProTIP, AnyTool, Toolshed). Suitable for dynamic tool pools.
Multi‑Label Classification : Treats a fixed set of tools as a classification problem (e.g., TinyAgent, Tool2Vec). Works when the tool set is relatively static.
Token‑Based Retrieval : Encodes each tool as a special token predicted during generation (e.g., ToolkenGPT, Toolken+, ToolGen). Scales to massive tool libraries.
Efficiency insight: token‑based methods are fastest but less generalizable; external retrievers are plug‑and‑play but computationally heavy; multi‑label classifiers require fine‑tuning but excel in static‑tool scenarios.
2.2 Tool Calling Strategies
In‑Place Parameter Filling : Fill tool parameters directly during response generation (Toolformer, CoA).
Parallel Tool Calling : Identify tool calls that can be executed concurrently, reducing sequential latency (LLMCompiler, LLM‑Tool Compiler, CATP‑LLM).
Cost‑Aware Calling : Optimize calls by treating computational cost as a reward or constraint (BTP, OTC‑PO, ToolOrchestra).
Test‑Time Expansion : Use search‑based pruning (e.g., A* search) to discard erroneous branches during inference (ToolChain*).
Post‑Training Optimization : Apply reinforcement learning to minimize redundant calls (ToolRL, ReTool, PORTool).
Key finding: Parallel calling can bring latency close to a single step when task dependencies are accurately identified; cost‑aware RL methods preserve accuracy while markedly reducing call frequency.
2.3 Tool‑Integrated Reasoning
Selective Calling (TableMind): Iterative plan‑act‑reflect loop with two‑stage training (SFT + RL).
SMART : Builds a dataset labeling the necessity of each call and fine‑tunes the model accordingly.
Cost‑Aware Strategy Optimization (RAPO): Rank‑aware weighting guides the model toward consistent answers.
ARTIST : Result‑oriented RL without step‑level supervision learns optimal tool usage.
AutoTIR : Specific reward/punishment signals discourage unnecessary tool use.
SWiRL : Filters redundant actions during parallel trajectory generation.
Trend: The field is shifting from “maximizing tool usage for accuracy” toward “Pareto‑optimal RL that minimizes redundant interactions.”
3. Efficient Planning
Figure: Overview of efficient planning.
3.1 Single‑Agent Planning Efficiency
QLASS : Uses a Q‑value critic to guide search.
ETO : Preference learning via DPO‑based trial‑and‑error.
RLTR / Planner‑R1 : Process‑level reward training.
Planning w/o Search : Offline goal‑condition critic replaces online search.
VOYAGER : Builds reusable skill libraries for downstream tasks.
GAP : Graph representation identifies actions that can be executed in parallel.
3.2 Multi‑Agent Collaboration Efficiency
Multi‑agent systems improve reasoning but often incur O(N²) communication costs.
Communication Reduction : Design protocols that share only essential summaries or compressed representations.
Parallel Execution : Use graph‑based planners (e.g., GAP) to schedule independent actions across agents.
Hierarchical Coordination : Organize agents into leader‑follower hierarchies to limit broadcast traffic.
References
https://arxiv.org/abs/2601.14192
https://efficient-agents.github.io/
https://github.com/yxf203/Awesome-Efficient-AgentsHow this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
