How Efficient Agents Are Redefining Memory, Tool Learning, and Planning in 2026

A joint survey by nine leading Chinese institutions outlines the efficiency crisis of modern AI agents and proposes three strategic directions—efficient memory, tool learning, and planning—detailing concrete mechanisms, representative models, and emerging trends for building high‑performing, low‑cost agents.

PaperAgent
PaperAgent
PaperAgent
How Efficient Agents Are Redefining Memory, Tool Learning, and Planning in 2026

Efficiency Crisis of Agents

Current agent architectures suffer from an input‑solution loop : each step’s output becomes the next step’s input, leading to cumulative token usage, high inference latency, and slow response times.

Three strategic directions are proposed to mitigate this crisis:

Efficient Memory

Efficient Tool Learning

Efficient Planning

1. Efficient Memory

Figure 2: Memory lifecycle – construction, management, and access.

Memory module architecture diagram
Memory module architecture diagram

1.1 Working Memory (text‑based)

COMEDY : LLM extracts session‑specific facts and compresses them into key events, user profiles, and relationship changes.

MemAgent / MEM1 : Processes long inputs sequentially, rewriting a compact memory state at each step.

AgentFold : Actively folds interaction history into multi‑scale summaries plus the latest full turn.

1.2 Implicit Working Memory

Activation Beacon : Stores context as continuous signals by distilling layer‑wise KV activations into compact beacons.

MemoryLLM : Maintains a fixed‑size token pool that self‑updates to reuse implicit knowledge.

Titans : Updates a neural memory module during inference, writing only when prediction error is high.

1.3 External Memory

MemoryBank : Applies the Ebbinghaus forgetting curve to decay stale memories while reinforcing important items.

Memory‑R1 / Mem0 : Extracts dialogue snippets, summarizes them into candidate memories, and supports CRUD operations.

A‑MEM : Converts interactions into atomic notes with context, keywords, and tags.

1.4 Graph‑Structured Memory

GraphReader : Segments long text into key elements and atomic facts, building a graph that captures long‑range dependencies.

AriGraph : Unified semantic‑scene memory graph; semantic triples update the semantic graph while scene nodes link the two.

Zep : Constructs a temporally aware knowledge graph, extracting and aligning entity relations and storing facts with expiration.

1.5 Hierarchical Memory

MemGPT : OS‑style virtual memory paging that partitions prompts into system instructions, writable working context, and FIFO message buffers.

MemoryOS : Three‑layer storage (short‑term dialogue pages, mid‑term topic segments, long‑term personal profiles).

LightMem : Perception‑STM‑LTM pipeline that pre‑compresses input, performs online soft updates, and offline consolidation.

1.6 Multi‑Agent Memory

Shared Memory : Centralized reusable information reduces redundancy. Representative methods: MS, G‑Memory, RCR‑Router, MIRIX.

Local Memory : Each agent stores information independently, yielding lightweight and low‑noise memory. Representative methods: Intrinsic Memory Agents, AgentNet, DAMCS.

Hybrid Memory : Combines shared and local memory with coordinated routing. Representative methods: SRMT, Collaborative Memory, LEGOMem.

2. Efficient Tool Learning

Figure: Classification of tool‑learning paradigms.

Tool learning classification table
Tool learning classification table

2.1 Tool Selection Paradigms

External Retriever : An independent model embeds queries and tool descriptions, then computes similarity (e.g., ProTIP, AnyTool, Toolshed). Suitable for dynamic tool pools.

Multi‑Label Classification : Treats a fixed set of tools as a classification problem (e.g., TinyAgent, Tool2Vec). Works when the tool set is relatively static.

Token‑Based Retrieval : Encodes each tool as a special token predicted during generation (e.g., ToolkenGPT, Toolken+, ToolGen). Scales to massive tool libraries.

Efficiency insight: token‑based methods are fastest but less generalizable; external retrievers are plug‑and‑play but computationally heavy; multi‑label classifiers require fine‑tuning but excel in static‑tool scenarios.

2.2 Tool Calling Strategies

In‑Place Parameter Filling : Fill tool parameters directly during response generation (Toolformer, CoA).

Parallel Tool Calling : Identify tool calls that can be executed concurrently, reducing sequential latency (LLMCompiler, LLM‑Tool Compiler, CATP‑LLM).

Cost‑Aware Calling : Optimize calls by treating computational cost as a reward or constraint (BTP, OTC‑PO, ToolOrchestra).

Test‑Time Expansion : Use search‑based pruning (e.g., A* search) to discard erroneous branches during inference (ToolChain*).

Post‑Training Optimization : Apply reinforcement learning to minimize redundant calls (ToolRL, ReTool, PORTool).

Key finding: Parallel calling can bring latency close to a single step when task dependencies are accurately identified; cost‑aware RL methods preserve accuracy while markedly reducing call frequency.

2.3 Tool‑Integrated Reasoning

Selective Calling (TableMind): Iterative plan‑act‑reflect loop with two‑stage training (SFT + RL).

SMART : Builds a dataset labeling the necessity of each call and fine‑tunes the model accordingly.

Cost‑Aware Strategy Optimization (RAPO): Rank‑aware weighting guides the model toward consistent answers.

ARTIST : Result‑oriented RL without step‑level supervision learns optimal tool usage.

AutoTIR : Specific reward/punishment signals discourage unnecessary tool use.

SWiRL : Filters redundant actions during parallel trajectory generation.

Trend: The field is shifting from “maximizing tool usage for accuracy” toward “Pareto‑optimal RL that minimizes redundant interactions.”

3. Efficient Planning

Figure: Overview of efficient planning.

Efficient planning overview
Efficient planning overview

3.1 Single‑Agent Planning Efficiency

QLASS : Uses a Q‑value critic to guide search.

ETO : Preference learning via DPO‑based trial‑and‑error.

RLTR / Planner‑R1 : Process‑level reward training.

Planning w/o Search : Offline goal‑condition critic replaces online search.

VOYAGER : Builds reusable skill libraries for downstream tasks.

GAP : Graph representation identifies actions that can be executed in parallel.

3.2 Multi‑Agent Collaboration Efficiency

Multi‑agent systems improve reasoning but often incur O(N²) communication costs.

Communication Reduction : Design protocols that share only essential summaries or compressed representations.

Parallel Execution : Use graph‑based planners (e.g., GAP) to schedule independent actions across agents.

Hierarchical Coordination : Organize agents into leader‑follower hierarchies to limit broadcast traffic.

References

https://arxiv.org/abs/2601.14192
https://efficient-agents.github.io/
https://github.com/yxf203/Awesome-Efficient-Agents
AI agentsAgentic AIPlanningtool learning2026research reviewefficient memory
PaperAgent
Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.