20 min read

How Alibaba Cloud Built Service‑Domain AI Agents: Design, Practice, and Results

This article explains how Alibaba Cloud designed and deployed large‑language‑model agents for its service domain, covering background, ideal LLM deployment, the shift from explanation to problem solving, the agent framework, practical implementation, automation trade‑offs, training, evaluation, and real‑world impact.

Alibaba Cloud Developer

Jul 17, 2024

How Alibaba Cloud Built Service‑Domain AI Agents: Design, Practice, and Results

Background

At the end of 2022, large‑language‑model (LLM) agents sparked widespread interest as a potential path toward AGI and a key technology for applying LLMs across domains. Alibaba Cloud partnered with Tongyi Lab to train a domain‑specific LLM and upgrade its customer‑service robot from a traditional QA bot to a generative dialogue system, with the agent module as a core component.

Ideal Form of LLM Deployment

Traditional LLM QA bots handle factual or knowledge‑based queries via pure text responses. However, real‑world scenarios require the model to act on the physical world, such as closing curtains or processing refunds, which demands an agent that can translate natural language commands into concrete actions.

From Explanation to Solution

Instead of merely explaining problems, agents must solve them by leveraging the LLM’s strong semantic understanding, chain‑of‑thought reasoning, and step‑by‑step planning to execute actions via external tools and APIs.

Agent Design Framework

Following the architecture described in Lilian Weng’s "LLM Powered Autonomous Agents," an agent consists of Planning, Memory, Tools, and Action. Planning decomposes tasks, reflects, and improves; Memory provides short‑ and long‑term context; Tools enable API calls; Action decides the final operation.

Service Domain Agent Design

In Alibaba Cloud’s after‑sales support, typical customer issues fall into fact, diagnostic, fuzzy, or other categories, with diagnostic issues being the most common. The agent workflow mirrors a human support engineer: identify the problem, query SOP tools, ask clarifying questions, retrieve information, and finally provide a solution.

Automation, Cost, and Controllability

Multi‑step API calls are the most time‑consuming part, so APIs are designed to be "plug‑and‑play" to reduce calls. Asynchronous card rendering shows progress for long‑running diagnostics, improving user experience. High‑quality fine‑tuning data improves API selection, questioning, and parameter extraction accuracy, minimizing execution failures.

Training and Evaluation

The domain LLM is fine‑tuned on Qwen Agent capabilities, then further customized for Alibaba Cloud’s specific services. A benchmark evaluates API selection, action execution, parameter extraction, end‑to‑end success, and generation quality (BLEU, ROUGE‑L) to select the production model.

Real‑World Effectiveness

Three typical cases are demonstrated: direct agent triggering with automatic parameter extraction, agent‑driven clarification when inputs are missing, and asynchronous card rendering for complex diagnostics. These deployments cover the top 30 high‑frequency scenarios and improve self‑service resolution rates by over 10% compared to pure text generation.

Conclusion

Agents represent a rapidly growing direction for LLM applications. Future work includes finer‑grained API scheduling, reducing tool development costs, and integrating reasoning structures such as Tree‑of‑Thought or Graph‑of‑Thought to further enhance agent intelligence.

name = 'my_image_gen'
description = 'AI painting (image generation) service, input text description, and return the image URL drawn based on text information.'
parameters = [{
    'name': 'prompt',
    'type': 'string',
    'description': 'Detailed description of the desired image content, in English',
    'required': True
}]