How Alibaba Cloud’s Aivis Redefines AI‑Powered Service Agents with Multi‑Agent Architecture

This article systematically explains the evolution of Alibaba Cloud’s intelligent service platform, focusing on the Aivis digital employee, its three‑layer Planner‑Reasoner‑Executor architecture, context‑engineering optimizations, multi‑agent workflow, and practical recommendations for building enterprise‑grade AI‑driven customer service solutions.

DataFunSummit
DataFunSummit
DataFunSummit
How Alibaba Cloud’s Aivis Redefines AI‑Powered Service Agents with Multi‑Agent Architecture

1. Business Background and Challenges

Alibaba Cloud provides 24/7 technical support for millions of enterprise customers across a complex stack that includes ECS, PAI, OSS, SLB, VPC, ACK, and ECI. Service requests often involve highly technical issues such as remote‑connection failures, API errors, and network security configurations, requiring deep domain expertise rather than simple transactional handling.

2. Evolution of Human‑Machine Collaboration

The AI service journey follows three stages:

Human‑Machine Collaboration 1.0 (2018‑2022) : BERT‑based chatbots (Chatbot) that follow a "question‑answer" pattern, with self‑service resolution rate as the primary metric.

Human‑Machine Collaboration 2.0 (2023‑2024) : GPT‑driven Copilot assistants that suggest actions and knowledge, aiming to reduce human handling time.

Human‑Machine Collaboration 3.0 (2024‑present) : The digital employee "Aivis" (Agent) that executes tasks, while humans supervise complex decisions, improving per‑agent service volume.

3. Multi‑Agent Architecture (Iceberg Model)

The architecture is divided into three layers:

Service Form Layer (Surface) : Chatbot, Copilot, Insights, Analyzer, and the digital employee Aivis.

Platform Capability Layer (Below Surface) : CloudArk platform providing Planner, Reasoner, Tool/MCP, SmartContext, and AivisDojo for service orchestration.

Model & Data Layer : Large models (Qwen, Alibaba‑domain models) combined with RAG, prompt engineering, and reinforcement learning, accessing product docs, knowledge bases, and historical tickets.

Iceberg Model
Iceberg Model

4. Planner‑Reasoner‑Executor Three‑Layer Design

Planner : Performs intent recognition, scene routing, and high‑level action planning without handling domain specifics.

Reasoner : Injects domain knowledge, executes logical reasoning, calls tools, and generates solutions.

Executor : Executes concrete tools (MCP, API, Sub‑Agent, RAG) and returns results.

Information flows through a shared Memory layer that passes Planner outputs to Reasoner and returns concise results to Executor, preventing context overflow and hallucinations.

Three‑Layer Architecture
Three‑Layer Architecture

5. Flexible Reasoner Scheduling Example

For an "email cannot send" issue, the Reasoner dynamically triggers MX/TXT/CNAME checks, parallel account‑status queries, or error‑code matching based on the provided input, orchestrated by a Workflow that balances controllability and intelligence.

Reasoner Scheduling
Reasoner Scheduling

6. Optimization Experience 1 – Clarification

Avoid ambiguous prompts that cause hallucinations. For example, differentiate between "ECS instance locked" (business‑status lock) and "Windows account locked" (system‑level lock) by explicitly defining terminology in the prompt and forcing a tool‑based status check.

Clarification Example
Clarification Example

7. Optimization Experience 2 – Structured Representation

Use JSON, YAML, or pseudo‑code to express multi‑step tool calls, converting natural‑language sequences like "call A then B" into a deterministic JSON schema with IDs, parameters, and dependencies, greatly improving stability.

Structured Representation
Structured Representation

8. Optimization Experience 3 – Context Feeding

Follow the principle "give what is needed, remove what distracts": inject only essential business logic into the prompt, strip unrelated fields, and avoid noisy data that leads to incorrect judgments (e.g., financial‑status mis‑interpretation).

Context Feeding
Context Feeding

9. Memory Management for Long Dialogues

Memory Compression : Summarize early rounds, keep recent N rounds detailed.

Short‑Term Memory : Extract key facts and repeatedly inject them into prompts.

Long‑Term Memory : Persist complex scenarios for cross‑session retrieval.

Memory Management
Memory Management

10. Advanced Strategies

Custom Tool Protocols : Define domain‑specific function signatures beyond generic function‑call APIs to improve accuracy.

Few‑Shot Usage : Provide diverse examples for single‑task scenarios, but limit for flexible tasks to avoid over‑fitting.

Context Slimming : Continuously prune tokens while preserving performance to prevent catastrophic forgetting.

11. Building Enterprise‑Grade Intelligent Services

The recommended three‑step path is:

Step 1 – Single‑Point Breakthrough : Target high‑frequency, high‑value, low‑complexity use cases (e.g., order status queries) to demonstrate AI value.

Step 2 – Process Integration : Embed AI into end‑to‑end workflows, creating a "human‑AI flywheel" where experts both use and train the model.

Step 3 – Human‑Machine Collaboration : Scale from a "pioneer bot" to specialized AI squads and finally to a fully integrated "intelligent organization" where silicon‑based agents and carbon‑based staff co‑operate.

Key metrics for AI adoption include the proportion of business handled autonomously by AI and the share of silicon‑based digital employees within the service team.

12. Q&A Highlights

Q1: How to connect AI‑generated forms to internal APIs? Use a low‑code platform to generate H5 components (frontend) and wrap internal functions as plugins (backend). Authentication is handled by a dedicated Authentication Tool that injects a token or session into the conversation context.

Q2: How to implement checkpoint‑based resume for long workflows? Two approaches: (1) modify the workflow engine to persist state (variables, context) at each node for node‑level retry; (2) wrap the workflow with a meta‑agent that reads persisted state and re‑starts from the failed node.

Q3: Why separate Planner and Reasoner? Planner stays lightweight, handling intent and routing, while Reasoner receives only domain‑specific knowledge, preventing context interference and hallucination. For complex 20+ step diagnostics, we use hierarchical degradation, batch planning, and sub‑agent decomposition to keep tool‑call depth manageable.

In summary, Aivis demonstrates that with careful context engineering, multi‑agent design, and layered memory management, AI can bridge the gap between rapid response expectations and the high technical complexity of cloud service support, paving the way for the next generation of enterprise‑level intelligent services.

cloud servicesArchitectureMulti-agentDigital EmployeeContext Engineering
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.