How Alibaba Cloud’s Aivis Redefines AI‑Powered Service Agents with Multi‑Agent Architecture
This article systematically explains the evolution of Alibaba Cloud’s intelligent service platform, focusing on the Aivis digital employee, its three‑layer Planner‑Reasoner‑Executor architecture, context‑engineering optimizations, multi‑agent workflow, and practical recommendations for building enterprise‑grade AI‑driven customer service solutions.
1. Business Background and Challenges
Alibaba Cloud provides 24/7 technical support for millions of enterprise customers across a complex stack that includes ECS, PAI, OSS, SLB, VPC, ACK, and ECI. Service requests often involve highly technical issues such as remote‑connection failures, API errors, and network security configurations, requiring deep domain expertise rather than simple transactional handling.
2. Evolution of Human‑Machine Collaboration
The AI service journey follows three stages:
Human‑Machine Collaboration 1.0 (2018‑2022) : BERT‑based chatbots (Chatbot) that follow a "question‑answer" pattern, with self‑service resolution rate as the primary metric.
Human‑Machine Collaboration 2.0 (2023‑2024) : GPT‑driven Copilot assistants that suggest actions and knowledge, aiming to reduce human handling time.
Human‑Machine Collaboration 3.0 (2024‑present) : The digital employee "Aivis" (Agent) that executes tasks, while humans supervise complex decisions, improving per‑agent service volume.
3. Multi‑Agent Architecture (Iceberg Model)
The architecture is divided into three layers:
Service Form Layer (Surface) : Chatbot, Copilot, Insights, Analyzer, and the digital employee Aivis.
Platform Capability Layer (Below Surface) : CloudArk platform providing Planner, Reasoner, Tool/MCP, SmartContext, and AivisDojo for service orchestration.
Model & Data Layer : Large models (Qwen, Alibaba‑domain models) combined with RAG, prompt engineering, and reinforcement learning, accessing product docs, knowledge bases, and historical tickets.
4. Planner‑Reasoner‑Executor Three‑Layer Design
Planner : Performs intent recognition, scene routing, and high‑level action planning without handling domain specifics.
Reasoner : Injects domain knowledge, executes logical reasoning, calls tools, and generates solutions.
Executor : Executes concrete tools (MCP, API, Sub‑Agent, RAG) and returns results.
Information flows through a shared Memory layer that passes Planner outputs to Reasoner and returns concise results to Executor, preventing context overflow and hallucinations.
5. Flexible Reasoner Scheduling Example
For an "email cannot send" issue, the Reasoner dynamically triggers MX/TXT/CNAME checks, parallel account‑status queries, or error‑code matching based on the provided input, orchestrated by a Workflow that balances controllability and intelligence.
6. Optimization Experience 1 – Clarification
Avoid ambiguous prompts that cause hallucinations. For example, differentiate between "ECS instance locked" (business‑status lock) and "Windows account locked" (system‑level lock) by explicitly defining terminology in the prompt and forcing a tool‑based status check.
7. Optimization Experience 2 – Structured Representation
Use JSON, YAML, or pseudo‑code to express multi‑step tool calls, converting natural‑language sequences like "call A then B" into a deterministic JSON schema with IDs, parameters, and dependencies, greatly improving stability.
8. Optimization Experience 3 – Context Feeding
Follow the principle "give what is needed, remove what distracts": inject only essential business logic into the prompt, strip unrelated fields, and avoid noisy data that leads to incorrect judgments (e.g., financial‑status mis‑interpretation).
9. Memory Management for Long Dialogues
Memory Compression : Summarize early rounds, keep recent N rounds detailed.
Short‑Term Memory : Extract key facts and repeatedly inject them into prompts.
Long‑Term Memory : Persist complex scenarios for cross‑session retrieval.
10. Advanced Strategies
Custom Tool Protocols : Define domain‑specific function signatures beyond generic function‑call APIs to improve accuracy.
Few‑Shot Usage : Provide diverse examples for single‑task scenarios, but limit for flexible tasks to avoid over‑fitting.
Context Slimming : Continuously prune tokens while preserving performance to prevent catastrophic forgetting.
11. Building Enterprise‑Grade Intelligent Services
The recommended three‑step path is:
Step 1 – Single‑Point Breakthrough : Target high‑frequency, high‑value, low‑complexity use cases (e.g., order status queries) to demonstrate AI value.
Step 2 – Process Integration : Embed AI into end‑to‑end workflows, creating a "human‑AI flywheel" where experts both use and train the model.
Step 3 – Human‑Machine Collaboration : Scale from a "pioneer bot" to specialized AI squads and finally to a fully integrated "intelligent organization" where silicon‑based agents and carbon‑based staff co‑operate.
Key metrics for AI adoption include the proportion of business handled autonomously by AI and the share of silicon‑based digital employees within the service team.
12. Q&A Highlights
Q1: How to connect AI‑generated forms to internal APIs? Use a low‑code platform to generate H5 components (frontend) and wrap internal functions as plugins (backend). Authentication is handled by a dedicated Authentication Tool that injects a token or session into the conversation context.
Q2: How to implement checkpoint‑based resume for long workflows? Two approaches: (1) modify the workflow engine to persist state (variables, context) at each node for node‑level retry; (2) wrap the workflow with a meta‑agent that reads persisted state and re‑starts from the failed node.
Q3: Why separate Planner and Reasoner? Planner stays lightweight, handling intent and routing, while Reasoner receives only domain‑specific knowledge, preventing context interference and hallucination. For complex 20+ step diagnostics, we use hierarchical degradation, batch planning, and sub‑agent decomposition to keep tool‑call depth manageable.
In summary, Aivis demonstrates that with careful context engineering, multi‑agent design, and layered memory management, AI can bridge the gap between rapid response expectations and the high technical complexity of cloud service support, paving the way for the next generation of enterprise‑level intelligent services.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
