How DeepAgent Redefines General AI Reasoning with Scalable Toolsets
DeepAgent, a new end‑to‑end reasoning agent, integrates autonomous thinking, dynamic tool search, and execution to handle over 16,000 APIs, embodied tasks, and research assistance, achieving state‑of‑the‑art performance on benchmarks like TMDB, ToolBench, ALFWorld, WebShop, and GAIA.
Overview
DeepAgent is an end‑to‑end deep reasoning agent that unifies autonomous thinking, dynamic tool discovery, and tool execution within a single reasoning flow, addressing limitations of prior agents in open‑ended environments.
Paper Details
Title: DeepAgent: A General Reasoning Agent with Scalable Toolsets
arXiv: https://arxiv.org/abs/2510.21618
GitHub: https://github.com/Rednote-DeepExperience/DeepAgent
Demonstrations
Massive Toolset – 16,000+ Real APIs
The agent can autonomously search, filter, and invoke the most suitable tool from a library of over 16,000 RapidAPI endpoints, completing complex tasks through continuous reasoning. In the demo, unavailable APIs were simulated by an LLM.
Embodied Intelligence in ALFWorld
Using a plug‑in action set (move, observe, pick), DeepAgent navigates the ALFWorld embodied AI environment, executing goal‑directed tasks such as object manipulation and exploration.
Research Assistant Capabilities
Equipped with web search, content extraction, code execution, visual question answering, and file handling tools, DeepAgent serves as a research assistant for scholarly investigations.
Motivation and Limitations of Existing Agents
Traditional workflow constraints: Frameworks like ReAct and Plan‑and‑Solve follow a rigid “think‑act‑observe” loop, lacking global task awareness and autonomous execution; they require pre‑specified tools.
Limited toolsets in current agents: Recent agents such as Search‑o1 and WebThinker integrate only narrow tool families (e.g., search, browsing), insufficient for diverse real‑world demands.
Core Design of DeepAgent
Unified Autonomous Reasoning Core
DeepAgent places a powerful LLM at the center of a continuous reasoning chain. When external interaction is needed, the model emits special tokens such as <tool_search>... to request a tool search or <tool_call>... to invoke a tool. The system captures these tokens, performs the operation, and feeds the result back, enabling on‑demand tool usage without a fixed toolbox.
An auxiliary LLM handles lengthy tool documentation and filters noisy tool outputs, keeping the main reasoning focused.
Autonomous Memory Folding
At any critical point (e.g., after a sub‑task or when a dead‑end is detected), DeepAgent can emit a <fold_thought> command. An auxiliary model compresses the interaction history into a structured memory summary, allowing the agent to “take a breath” and rethink its strategy, reducing computation and improving success on long‑horizon tasks.
Structured Memory Inspired by the Human Brain
All historical information is stored in a stable JSON format, organized into three memory types:
Episodic Memory: Records high‑level milestones, key decisions, and major events for long‑term reflection and strategic planning.
Working Memory: Short‑term cache for current sub‑goals, immediate challenges, and next‑step plans, ensuring continuity across folding operations.
Tool Memory: Self‑updating handbook that logs each tool’s usage pattern, performance, and outcomes, enabling refined tool‑selection over time.
ToolPO – Specialized Reinforcement Learning Framework
ToolPO (Tool Policy Optimization) is an end‑to‑end RL method for universal tool usage, introducing two innovations:
LLM Tool Simulator: During training, an auxiliary LLM simulates API responses, avoiding costly and unstable calls to thousands of real APIs.
Dual‑Advantage Attribution: Rewards are split into a global success reward and a per‑tool correctness reward. Advantage attribution ensures that only tokens responsible for a tool call receive the tool‑specific reward, leading to precise and efficient learning.
Experimental Evaluation
DeepAgent was evaluated on eight benchmarks covering general tool usage and downstream applications.
TMDB benchmark: Success rate 89.0%, versus 55.0% for comparable models.
ToolBench (16,000+ APIs): Success rate 64.0%.
Embodied AI (ALFWorld), Online Shopping (WebShop), General AI Assistant (GAIA): State‑of‑the‑art scores, e.g., GAIA 53.3.
Ablation Studies
Removing end‑to‑end ToolPO training caused the largest performance drop.
Disabling memory folding significantly reduced GAIA performance.
Omitting the LLM tool simulator or advantage attribution each degraded results.
Dynamic Tool Retrieval vs. Pre‑retrieval
Traditional agents retrieve all possible tools before execution. DeepAgent discovers tools on‑the‑fly while reasoning, showing superior performance especially with tens of thousands of APIs.
Scalability with Action Steps
Increasing the maximum allowed reasoning steps consistently improves DeepAgent’s performance, widening the gap with ReAct‑based baselines.
Generalization Across Model Sizes
Applied to both 30B‑parameter and 235B‑parameter LLM backbones, DeepAgent outperformed traditional agents, and performance scaled with model size.
Conclusion and Outlook
DeepAgent establishes a new benchmark for universal AI agents by combining a unified autonomous reasoning core, memory folding, structured memory, and the ToolPO training framework. Future work includes more advanced memory management, natural human‑AI collaboration, and multi‑agent systems.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
