Artificial Intelligence 16 min read

Tackling Real‑World Challenges in Multi‑Agent React: From ToolCalls to Context Compression

This article analyzes production‑grade issues of a multi‑agent React framework—such as long ToolCall latency, context bloat, missing intermediate states, loop control, and supervision gaps—and presents concrete XML‑based tool‑call prompts, context‑compression techniques, summary tools, and a plug‑and‑play MCP supervisor that together improve performance, reliability, and user‑facing output quality.

Alibaba Cloud Developer

Sep 9, 2025

Tackling Real‑World Challenges in Multi‑Agent React: From ToolCalls to Context Compression

React模式挑战点

In production, multi‑agent collaboration uses many patterns (hierarchical, nested, hand‑off, group chat), but the most common is hierarchical command where a main agent splits tasks and sub‑agents execute them (e.g., Cursor, Aone Copilot, Manus). Our autonomous planning mode built on WhaleSdk + ElemeMcpClient faces several typical challenges.

Long response time caused by large‑model ToolCalls.

Missing dynamic compression and traceability in context communication.

Insufficient intermediate states generated by the main agent.

Unintelligent loop termination.

Inadequate supervision of the planned plan.

We propose solutions for each issue and share the resulting improvements.

Better Prompt for Tool Calls than FunctionCall

FunctionCall often leads to long waiting times and is not supported by all models. We adopt a streaming XML format to return tool names and arguments while simultaneously streaming the model’s reasoning (Thought). This lets users see the agent’s thinking process.

Respond to the human as helpfully and accurately as possible.
1. You are an agent, call tools until the task is perfectly completed, stop only when the problem is solved.
2. Use tools to collect information, never guess.
3. Some tools produce documents; only return the document title in the arguments.
4. **Important**: The final answer must be a complete, detailed response, not just a task summary.
5. Start by splitting the task, give reasoning and planning, then execute step by step, adjusting the plan based on tool results.

You have the following tools:
%s // tool list placeholder

Return the reason, tool name and arguments for each call, one operation at a time, using the format:
<tool_name>ToolName</tool_name>
<arguments>{"key":"value"}</arguments>

Using XML enables streaming rendering, works with models that don’t support ToolCalls, and can be combined with multimodal models for future extensions.

Simple Context Compression

When tool calls generate long contexts, inference slows, sharing original text with sub‑agents becomes difficult, and model costs increase. We introduce a reference‑based approach: store long texts as files, keep only reference IDs in the context, and retrieve the full text when needed.

We also define special generation tools (e.g., PRD generation, text rewriting) that are invoked automatically when the context becomes too large.

Ensuring Meaningful Summaries

Typical end‑of‑task detection merely checks for an empty tool list, which yields terse, user‑unfriendly summaries. We add a dedicated summary tool that, after the main task finishes, calls the base model once more to produce a detailed, user‑oriented report.

Comparisons show the concise default output versus the rich summary generated by the new tool.

Maintaining the Original Planning Trajectory

Without supervision, agents may drift from the initial plan or enter dead loops. We implement a plug‑and‑play MCP (monitor‑control‑plan) service that acts as a supervisory tool, updating a todo list after each tool call and ensuring the overall plan stays on track.

Demo screenshots illustrate the planning, execution, and final high‑quality output.

Conclusion

We analyzed performance and experience issues of our React‑based multi‑agent assistant in production, presented optimization strategies for tool calls, context handling, summarization, and supervision, and shared empirical gains. The approach is open for further refinement as larger models become available.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI Planning Context Compression ReAct pattern tool calls

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.