How DeepAgent Redefines General AI Reasoning with Scalable Toolsets

DeepAgent, a new end‑to‑end reasoning agent, integrates autonomous thinking, dynamic tool search, and execution to handle over 16,000 APIs, embodied tasks, and research assistance, achieving state‑of‑the‑art performance on benchmarks like TMDB, ToolBench, ALFWorld, WebShop, and GAIA.

DataFunTalk
DataFunTalk
DataFunTalk
How DeepAgent Redefines General AI Reasoning with Scalable Toolsets

Overview

DeepAgent is an end‑to‑end deep reasoning agent that unifies autonomous thinking, dynamic tool discovery, and tool execution within a single reasoning flow, addressing limitations of prior agents in open‑ended environments.

DeepAgent overview
DeepAgent overview

Paper Details

Title: DeepAgent: A General Reasoning Agent with Scalable Toolsets

arXiv: https://arxiv.org/abs/2510.21618

GitHub: https://github.com/Rednote-DeepExperience/DeepAgent

Demonstrations

Massive Toolset – 16,000+ Real APIs

The agent can autonomously search, filter, and invoke the most suitable tool from a library of over 16,000 RapidAPI endpoints, completing complex tasks through continuous reasoning. In the demo, unavailable APIs were simulated by an LLM.

Embodied Intelligence in ALFWorld

Using a plug‑in action set (move, observe, pick), DeepAgent navigates the ALFWorld embodied AI environment, executing goal‑directed tasks such as object manipulation and exploration.

Research Assistant Capabilities

Equipped with web search, content extraction, code execution, visual question answering, and file handling tools, DeepAgent serves as a research assistant for scholarly investigations.

Demonstrations
Demonstrations

Motivation and Limitations of Existing Agents

Traditional workflow constraints: Frameworks like ReAct and Plan‑and‑Solve follow a rigid “think‑act‑observe” loop, lacking global task awareness and autonomous execution; they require pre‑specified tools.

Limited toolsets in current agents: Recent agents such as Search‑o1 and WebThinker integrate only narrow tool families (e.g., search, browsing), insufficient for diverse real‑world demands.

Core Design of DeepAgent

Unified Autonomous Reasoning Core

DeepAgent places a powerful LLM at the center of a continuous reasoning chain. When external interaction is needed, the model emits special tokens such as <tool_search>... to request a tool search or <tool_call>... to invoke a tool. The system captures these tokens, performs the operation, and feeds the result back, enabling on‑demand tool usage without a fixed toolbox.

An auxiliary LLM handles lengthy tool documentation and filters noisy tool outputs, keeping the main reasoning focused.

Autonomous Memory Folding

At any critical point (e.g., after a sub‑task or when a dead‑end is detected), DeepAgent can emit a <fold_thought> command. An auxiliary model compresses the interaction history into a structured memory summary, allowing the agent to “take a breath” and rethink its strategy, reducing computation and improving success on long‑horizon tasks.

Structured Memory Inspired by the Human Brain

All historical information is stored in a stable JSON format, organized into three memory types:

Episodic Memory: Records high‑level milestones, key decisions, and major events for long‑term reflection and strategic planning.

Working Memory: Short‑term cache for current sub‑goals, immediate challenges, and next‑step plans, ensuring continuity across folding operations.

Tool Memory: Self‑updating handbook that logs each tool’s usage pattern, performance, and outcomes, enabling refined tool‑selection over time.

ToolPO – Specialized Reinforcement Learning Framework

ToolPO (Tool Policy Optimization) is an end‑to‑end RL method for universal tool usage, introducing two innovations:

LLM Tool Simulator: During training, an auxiliary LLM simulates API responses, avoiding costly and unstable calls to thousands of real APIs.

Dual‑Advantage Attribution: Rewards are split into a global success reward and a per‑tool correctness reward. Advantage attribution ensures that only tokens responsible for a tool call receive the tool‑specific reward, leading to precise and efficient learning.

ToolPO architecture
ToolPO architecture

Experimental Evaluation

DeepAgent was evaluated on eight benchmarks covering general tool usage and downstream applications.

TMDB benchmark: Success rate 89.0%, versus 55.0% for comparable models.

ToolBench (16,000+ APIs): Success rate 64.0%.

Embodied AI (ALFWorld), Online Shopping (WebShop), General AI Assistant (GAIA): State‑of‑the‑art scores, e.g., GAIA 53.3.

Ablation Studies

Removing end‑to‑end ToolPO training caused the largest performance drop.

Disabling memory folding significantly reduced GAIA performance.

Omitting the LLM tool simulator or advantage attribution each degraded results.

Dynamic Tool Retrieval vs. Pre‑retrieval

Traditional agents retrieve all possible tools before execution. DeepAgent discovers tools on‑the‑fly while reasoning, showing superior performance especially with tens of thousands of APIs.

Scalability with Action Steps

Increasing the maximum allowed reasoning steps consistently improves DeepAgent’s performance, widening the gap with ReAct‑based baselines.

Generalization Across Model Sizes

Applied to both 30B‑parameter and 235B‑parameter LLM backbones, DeepAgent outperformed traditional agents, and performance scaled with model size.

Performance scaling
Performance scaling

Conclusion and Outlook

DeepAgent establishes a new benchmark for universal AI agents by combining a unified autonomous reasoning core, memory folding, structured memory, and the ToolPO training framework. Future work includes more advanced memory management, natural human‑AI collaboration, and multi‑agent systems.

Conclusion
Conclusion
memory managementLarge Language ModelReasoningreinforcement learning
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.