How Large Language Models Power XiaoAI: From Intent Routing to Response Generation
This article explores how large language models are integrated into Xiaomi’s XiaoAI assistant, detailing the system’s architecture, intent distribution, domain-specific understanding, and response generation, while sharing practical challenges, prompt engineering solutions, and fine‑tuning strategies that boosted user retention and query satisfaction.
Overview of XiaoAI
XiaoAI is an omnipresent AI assistant covering suggestions, voice, vision, translation, and calling, deployed on phones, speakers, TVs, and Xiaomi cars.
Since the release of ChatGPT, the large‑model wave has prompted both domestic and international companies to explore large‑model applications. Xiaomi decided to rebuild XiaoAI with large models, achieving a 10% increase in next‑day user retention and an 8% rise in query‑satisfaction.
Large Model Intent Distribution
Purpose and Benefits
The intent‑distribution model determines the intent category of a user query and routes it to a domain‑specific agent for deeper understanding. By delegating to specialized agents, model iteration becomes easier and faster, improving the ability to meet user needs.
Challenges
Two main difficulties arise: (1) the model requires knowledge to distinguish similar commands (e.g., "open settings" vs. "open air‑conditioner"); (2) latency must stay under 200 ms for XiaoAI scenarios.
Approaches and Fine‑tuning
Initially, prompt engineering with few‑shot examples was used to define intents and provide sample queries. This mitigated some issues but introduced new problems: the prompt length grew large, exceeding token limits and violating latency constraints, and smaller models often ignored instructions, only very large (hundreds of billions) models followed prompts reliably.
To address these, a few‑shot prompt with explicit intent definitions was adopted, yet the token overhead remained prohibitive. Consequently, the team turned to fine‑tuning the large model on domain‑specific data, achieving more consistent intent prediction within the required latency budget.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
