Why Planning Boosts Multi‑Tool Agent Performance and How to Implement It
This article explains the importance of planning for multi‑tool AI agents, compares OpenAI and Anthropic approaches, presents experimental results, and provides practical guidance on tool design, prompt configuration, model selection, and parallel versus serial tool calls to improve efficiency and effectiveness.
Why Planning Matters
Both Anthropic and OpenAI recommend that multi‑tool agents perform explicit planning before invoking tools, which significantly improves task success rates (e.g., a 4% increase on SWE‑bench).
OpenAI Prompt Example
You MUST plan extensively before each function call and reflect on the outcomes of previous calls. DO NOT rely solely on function calls, as this can impair problem‑solving and insightful thinking.
Translation: Before every function call, the model must plan thoroughly and reflect on the results; avoiding a pure sequence of calls preserves reasoning ability.
Anthropic’s "Think" Tool
Anthropic introduces a "think" tool that lets the model log thoughts without fetching new information, aiding complex reasoning and cache memory.
{
"name": "think",
"description": "Use the tool to think about something. It will not obtain new information or change the database, but just append the thought to the log. Use it when complex reasoning or some cache memory is needed.",
"input_schema": {
"type": "object",
"properties": {
"thought": {"type": "string", "description": "A thought to think about."}
},
"required": ["thought"]
}
}Anthropic evaluated the tool with the τ‑bench benchmark, showing substantial gains in airline (54% improvement) and retail domains.
Chosen Solution
We selected Anthropic’s tool‑based approach because OpenAI’s planning relies on post‑training instruction compliance, which is weaker for open‑source models without fine‑tuning. Using a structured tool improves compliance and output consistency.
Tool calls follow a fixed schema (thought, plan, action, thoughtNumber) to ensure structured, complete outputs.
Explicit tool calls are clearer than vague “plan” instructions, especially in complex, multi‑tool scenarios.
Implementation in Our Internal Agent Platform
We demonstrate the implementation using our internal platform; the same principles apply to LangChain or other agent frameworks.
Model Selection
DeepSeek V3 Function Call model currently offers strong planning and tool‑call capabilities.
End‑to‑End Loop Mode
The agent repeatedly calls the model until either the model stops invoking tools and replies directly, or a predefined call limit is reached.
Parallel vs. Serial Tool Calls
Parallel calls allow a single model invocation to output multiple independent tools, while serial calls wait for each tool’s result before proceeding. Parallel calls can cause hallucinations if tools depend on each other, so they are disabled by default.
Tool Configuration Example
{
"name": "思考和规划",
"id": "think_and_plan",
"description": "Systematic thinking and planning tool for complex tasks. It logs thought, plan, action, and thoughtNumber without accessing new data.",
"input_schema": {
"type": "object",
"properties": {
"thought": {"type": "string", "description": "Current thinking content, analysis, hypothesis, or summary."},
"plan": {"type": "string", "description": "Step‑by‑step plan for the task."},
"action": {"type": "string", "description": "Concrete next action, possibly invoking other tools."},
"thoughtNumber": {"type": "string", "description": "Identifier for the current thinking step."}
},
"required": ["thought", "plan", "action", "thoughtNumber"]
}
}Agents must call this tool before any task‑specific tool and after each tool result to reflect and plan next steps.
Prompt Configuration
<instruction>
1. You are an agent; keep invoking tools until the user’s task is perfectly completed.
2. Never guess; always gather information via tools.
3. Before any task tool, **first call the 思考和规划 tool** to think, plan, and act, outputting thought, plan, action, thoughtNumber.
- The tool only logs thoughts; it does not fetch new data.
- After thinking you may call multiple task tools in parallel.
- After task tools finish, stop output; the system returns results, then you must call the 思考和规划 tool again.
</instruction>Additional business‑specific prompts should describe which tools to use for particular tasks (e.g., code interpreter, data overview) and how to sequence them.
Key Takeaways
Explicit planning via a dedicated tool improves model compliance and performance in multi‑tool scenarios.
Parallel tool calls are useful for independent tasks but can cause errors when dependencies exist.
Choosing a model with strong function‑call support (e.g., DeepSeek V3) is crucial.
Future work includes testing reasoning‑only models like Qwen3 to compare effectiveness of the thinking tool versus pure inference.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
