Artificial Intelligence 4 min read

How Large Language Models Power XiaoAI: From Intent Routing to Response Generation

This article explores how large language models are integrated into Xiaomi’s XiaoAI assistant, detailing the system’s architecture, intent distribution, domain-specific understanding, and response generation, while sharing practical challenges, prompt engineering solutions, and fine‑tuning strategies that boosted user retention and query satisfaction.

DataFunSummit

Sep 29, 2025

How Large Language Models Power XiaoAI: From Intent Routing to Response Generation

Overview of XiaoAI

XiaoAI is an omnipresent AI assistant covering suggestions, voice, vision, translation, and calling, deployed on phones, speakers, TVs, and Xiaomi cars.

Since the release of ChatGPT, the large‑model wave has prompted both domestic and international companies to explore large‑model applications. Xiaomi decided to rebuild XiaoAI with large models, achieving a 10% increase in next‑day user retention and an 8% rise in query‑satisfaction.

Large Model Intent Distribution

Purpose and Benefits

The intent‑distribution model determines the intent category of a user query and routes it to a domain‑specific agent for deeper understanding. By delegating to specialized agents, model iteration becomes easier and faster, improving the ability to meet user needs.

Challenges

Two main difficulties arise: (1) the model requires knowledge to distinguish similar commands (e.g., "open settings" vs. "open air‑conditioner"); (2) latency must stay under 200 ms for XiaoAI scenarios.

Approaches and Fine‑tuning

Initially, prompt engineering with few‑shot examples was used to define intents and provide sample queries. This mitigated some issues but introduced new problems: the prompt length grew large, exceeding token limits and violating latency constraints, and smaller models often ignored instructions, only very large (hundreds of billions) models followed prompts reliably.

To address these, a few‑shot prompt with explicit intent definitions was adopted, yet the token overhead remained prohibitive. Consequently, the team turned to fine‑tuning the large model on domain‑specific data, achieving more consistent intent prediction within the required latency budget.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Large Language Models model fine-tuning AI assistants Intent Routing XiaoAI

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.