How Multi‑Agent ReAct Architecture Boosts E‑Commerce AI Assistants
This article explains the evolution of multi‑agent systems for e‑commerce assistants, detailing the ReAct‑based planning framework, hierarchical master‑sub agent collaboration, evaluation methods, and sample‑generation techniques that together improve accuracy, efficiency, and scalability of AI‑driven merchant services.
Introduction
The multi‑agent architecture for merchant assistants has evolved through three stages: (1) B‑mall automatic ticket reply using LLM + RAG without tool invocation; (2) JD招商站 with a single agent handling knowledge‑base QA and tool calls, suffering low accuracy and hallucinations; (3) JD‑Mai intelligent assistant introducing a master‑sub‑agent collaborative model that significantly improves accuracy.
The assistant’s algorithmic foundation is a Large Language Model‑based multi‑agent system that mimics real‑world merchant team collaboration, allowing merchants to interact via natural language to obtain 24/7 operational support.
1. Mapping Real‑World Merchant Operations to Multi‑Agent Algorithm Space
The design motivation is to simulate human problem‑solving processes with agents. First, the real‑world merchant and team operations are described, then a role‑mapping to the AI world is performed.
2. Key Technologies of Multi‑Agent Planning
2.1 Agent Construction: ReAct Paradigm Multi‑Model Integration
LLM: interprets the problem, extracts the ultimate goal, guides reverse planning, and validates the tool‑call chain.
Embedding: quickly matches the goal node to tools, avoiding lengthy prompts and hallucinated tool selection.
Tools DAG: performs multi‑path reverse reasoning, extracting parameters for precise scheduling.
Operations Optimization: theoretically accelerates solving and improves reverse‑planning efficiency (pending empirical validation).
ReAct enables dynamic updates: each step of forward execution triggers a planning update based on the observed result.
2.2 Multi‑Agent Online Inference
2.2.1 Technical Features
Hierarchical dynamic planning and distributed collaboration based on the ReAct paradigm, with a Master Agent coordinating sub‑agents.
Master Agent decomposes complex scenarios into independent sub‑tasks and dispatches them to Sub Agents.
Sub Agents execute assigned tasks, supporting distributed scheduling and cooperation.
Standard communication protocol ensures efficient multi‑agent coordination, multi‑step linking, and global chain‑of‑thought planning.
2.2.2 Demonstration
A video demonstrates the online collaborative inference process, showing how the front‑end assistant UI maps to the back‑end multi‑agent inference service.
2.2.3 Architectural Summary
Low inference difficulty: transforms large‑model multi‑step planning into next‑task prediction.
Low cost: multiple small models cooperate, reducing training and deployment expenses.
Fast iteration: rapid problem localization enables quick model updates.
Open challenges include long response time for complex queries, error accumulation in chained reasoning, and the need for multi‑agent joint learning to mitigate risks. Compared with single‑agent or LLM‑MoE architectures, the multi‑agent design offers higher stability for complex business scenarios at the cost of increased engineering effort.
2.3 Agent Full‑Link ReAct Evaluation
Global evaluation: decomposes tasks and schedules them, assigning weighted scores to each agent to compute overall system efficiency.
Local evaluation: uses a Reward Model to assess thought/action/observation cycles, identifying bottlenecks and suggesting optimizations.
Diverse Reward Models: business‑customizable rules, existing high‑level LLMs for general evaluation, and trained reward models for task‑specific assessment.
2.4 LLM Offline/Online Sample Enhancement
Automated offline sample generation by standardizing business data, enabling rapid creation of high‑quality training data for various scenarios.
Automated online inference labeling and sample accumulation using multiple Reward Model strategies, continuously expanding and refining the sample library to improve online inference capability.
References
Step 1: Caller initiates request. Only the user can call the Master Agent; domain agents can be called by the Master or other domain agents. Step 2: Agent performs Planning/Reasoning, retrieving conversation history from Memory. Step 3: Reasoning generates
thought(natural‑language description of the goal) and
action_code(structured list of tasks). Step 4: Agent executes tool calls defined in
action_code. Step 5: Called tool returns results. Step 6: Agent writes ReAct information to Memory and logs. Step 7: Agent responds based on the trust mode (direct response or further ReAct cycles).
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.