Artificial Intelligence 14 min read

Engineering Architecture of Alibaba's AI Digital Employee "AI XiaoWan"

Alibaba’s AI digital employee “AI XiaoWan” uses a native multi‑agent architecture where a Controller Agent interprets intent, plans tasks, and orchestrates execution while an Executable Agent performs domain‑specific operations, communicating via a standardized Agent Communication Protocol, leveraging a centralized Tool Center, a retrieval‑augmented knowledge base, and a data‑flywheel feedback loop to continuously improve and evolve toward memory‑based reasoning and self‑learning.

Alimama Tech
Alimama Tech
Alimama Tech
Engineering Architecture of Alibaba's AI Digital Employee "AI XiaoWan"

Alibaba's AI digital employee "AI XiaoWan" is an AI‑powered promotional assistant for advertisers on the Taobao platform. It provides personalized recommendations through multi‑turn dialogues, knowledge‑base Q&A, data‑quick‑search, AI‑driven diagnostics, and tool‑assisted actions.

The system adopts a Native AI Agent model built on a vertical Multi‑Agent architecture. The core agents are the Controller Agent , which interprets user intent, plans tasks, and orchestrates execution, and the Executable Agent , which carries out domain‑specific operations such as data queries.

Agents communicate via a standardized Agent Communication Protocol that carries both raw and rewritten natural‑language instructions together with structured variables. The protocol is inspired by Google Langfun and the open‑source AgentProtocol.

Example of an agent input payload:

{
  "input": xxx, // instantiated natural‑language call
  "additionalInput": {
    "rawInput": xxx, // original request
    "rewriteInput": xxx, // rewritten request
    "variable1": xxx, // related variable
    ...
  },
  ...
}

Example of an agent output payload:

{
  "status": "success|failed|unsupported", // execution result
  "outputType": "standardized type",
  "output": "xxx | [] | {}",
  "outputSchema": {},
  "additionalOutput": {},
  "tasks": [
    {
      "task_id": "xxx",
      "name": "xx",
      "type": "agent, tool, function, xx",
      "output": "xxx | [] | {}",
      "additionalOutput": {},
      "input": {}
    }
  ]
  ...
}

The Controller Agent uses a Planner module to decompose user requests, generate data‑query commands, and store intermediate results in a Working Memory (e.g., variable tb1f88f642f5 ). The Executable Agent then executes the plan, invoking tools from the centralized Tool Center.

Tool‑Use capabilities are standardized through a Tool Center that supports HTTP/RPC registration, Python Playground development, and sample management (positive, negative, evaluation). Tools are tagged and indexed, enabling vector‑plus‑tag retrieval and in‑context learning for dynamic tool selection.

A knowledge base built for XiaoWan provides end‑to‑end Retrieval‑Augmented Generation (RAG). It supports configurable ingestion pipelines, multi‑dimensional indexing, chunking, and both offline and real‑time index construction. Knowledge items are versioned, can be updated automatically, and are linked to the agents for prompt‑based retrieval.

Data‑flywheel mechanisms collect user feedback, label problematic cases, and feed them back into evaluation pipelines (manual and automated). Evaluation tasks generate reports with metric trends, driving continuous improvement of the AI assistant.

Future work focuses on enhancing memory‑based perception, System‑2 reasoning, and a full DIKW learning loop to enable self‑evolution of the digital employee.

AITool IntegrationRAGKnowledge Baselarge language modelmulti-agent
Alimama Tech
Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.