Multi‑Agent Architecture for an E‑Commerce Business Assistant: Design, Planning, Evaluation, and Sample Generation
The document describes the evolution, design principles, key technologies, online inference workflow, evaluation methods, and sample‑generation techniques of a large‑language‑model‑based multi‑agent system that powers a 24/7 e‑commerce merchant assistant, highlighting its benefits, challenges, and future work.
Introduction – The merchant assistant is built on a large language model (LLM) driven multi‑agent system that mimics the collaborative workflow of real‑world e‑commerce teams. It offers 24/7 business support through natural‑language interaction and has evolved through three stages, culminating in a master‑plus‑sub‑agents architecture that significantly improves accuracy.
From Real‑World Business to Multi‑Agent Space – The system maps multiple real‑world merchant roles to agents, providing a generic, open host for capabilities such as sales forecasting, marketing, pricing, and keyword recommendation. Tools (agents, APIs) can be added at any development stage.
2.1 Agent Construction – ReAct Paradigm with Multi‑Model Integration – Four model types are combined: LLM for goal extraction and validation, Embedding for fast tool matching, Tools DAG for multi‑path reverse reasoning, and Operations Research optimization for planning efficiency. ReAct enables dynamic planning updates after each execution step.
2.2 Multi‑Agent Online Inference – A master agent decomposes complex tasks into sub‑agents that perform hierarchical dynamic planning and distributed scheduling. Communication follows a standard protocol, supporting multi‑step coordination and global chain‑of‑thought planning.
2.2.1 Technical Features – Task‑layered planning, distributed collaboration, and a standardized communication protocol ensure efficient cooperation among agents.
2.2.2 Demonstration – A video showcases the end‑to‑end online inference process of the assistant.
2.3 Full‑Chain ReAct Evaluation – System‑wide evaluation aggregates weighted scores of each agent, while local evaluation uses a Reward Model to assess thought/action/observation quality, identifying bottlenecks.
2.4 Reward Model Variants – Supports custom business rules, leverages existing SOTA LLMs, and allows training of dedicated reward models. Example prompt for evaluating intent‑summarization quality is shown below:
输入总结模型的目标是针对用户历史的会话记录与本轮的提问分析其具体意图,作为Master Agent的思考的核心环节,需要对其意图总结效果进行评价。2.5 LLM Offline Sample Enhancement – Standardized business data are used to automatically generate and expand training samples for LLMs, while online inference data are continuously labeled via reward‑model strategies, enriching the sample pool.
Challenges & Benefits – The architecture improves planning efficiency, reduces inference cost, enhances stability, mitigates LLM hallucination, lowers sample engineering effort, and enables rapid iteration. Remaining issues include longer response times for complex queries and error accumulation in chained reasoning, which are being addressed through multi‑agent joint learning.
References – The document lists the full interaction protocol (request, planning, reasoning, tool calls, logging, and response) and provides QR‑code links for a technical community.
JD Tech Talk
Official JD Tech public account delivering best practices and technology innovation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.