How Large Language Models Power XiaoAI: From Intent Routing to Response Generation

This article explores how large language models are integrated into Xiaomi’s XiaoAI assistant, detailing the system’s architecture, intent distribution, domain-specific understanding, and response generation, while sharing practical challenges, prompt engineering solutions, and fine‑tuning strategies that boosted user retention and query satisfaction.

DataFunSummit
DataFunSummit
DataFunSummit
How Large Language Models Power XiaoAI: From Intent Routing to Response Generation

Overview of XiaoAI

XiaoAI is an omnipresent AI assistant covering suggestions, voice, vision, translation, and calling, deployed on phones, speakers, TVs, and Xiaomi cars.

Since the release of ChatGPT, the large‑model wave has prompted both domestic and international companies to explore large‑model applications. Xiaomi decided to rebuild XiaoAI with large models, achieving a 10% increase in next‑day user retention and an 8% rise in query‑satisfaction.

Architecture diagram
Architecture diagram

Large Model Intent Distribution

Purpose and Benefits

The intent‑distribution model determines the intent category of a user query and routes it to a domain‑specific agent for deeper understanding. By delegating to specialized agents, model iteration becomes easier and faster, improving the ability to meet user needs.

Challenges

Two main difficulties arise: (1) the model requires knowledge to distinguish similar commands (e.g., "open settings" vs. "open air‑conditioner"); (2) latency must stay under 200 ms for XiaoAI scenarios.

Approaches and Fine‑tuning

Initially, prompt engineering with few‑shot examples was used to define intents and provide sample queries. This mitigated some issues but introduced new problems: the prompt length grew large, exceeding token limits and violating latency constraints, and smaller models often ignored instructions, only very large (hundreds of billions) models followed prompts reliably.

To address these, a few‑shot prompt with explicit intent definitions was adopted, yet the token overhead remained prohibitive. Consequently, the team turned to fine‑tuning the large model on domain‑specific data, achieving more consistent intent prediction within the required latency budget.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Large Language Modelsmodel fine-tuningAI assistantsIntent RoutingXiaoAI
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.