How Large Language Models Power Xiaomi’s Xiao AI Assistant
This article explains how Xiaomi’s Xiao AI assistant leverages large language models for intent routing, domain‑specific intent understanding, and response generation, detailing the system architecture, challenges such as knowledge requirements and latency constraints, and the shift from prompt engineering to model fine‑tuning.
Introduction
This article introduces the application of large models in the Xiao AI product, sharing practices in intent routing, intent understanding, and response generation.
Contents
Xiao AI overview
Large model intent routing
Domain‑specific intent understanding
Response generation
Xiao AI Overview
Xiao AI is an omnipresent AI assistant. Its product line includes suggestions, voice, vision, translation, and calls, and it runs on phones, speakers, TVs, and Xiaomi cars.
Impact of Large Models
Since the launch of ChatGPT, the large‑model wave has prompted companies to adopt LLMs. By rebuilding Xiao AI with large models, product experience and user retention improved, with next‑day active‑user retention increasing by 10% and long‑tail query satisfaction rising by 8%.
System Architecture
When a user query arrives, a large‑model intent routing component first determines the query’s intent and routes it to downstream domain‑specific agents. Each domain agent has its own intent‑understanding large model.
Large Model Intent Routing
The purpose of intent routing is to classify the user query’s intent and forward it to the appropriate domain agent for deeper understanding. This modular approach reduces model iteration difficulty and improves efficiency.
Challenges
The model needs knowledge to correctly interpret intents (e.g., distinguishing "open settings" from "open air conditioner").
Latency must be kept under 200 ms for a responsive Xiao AI experience.
Attempts with Large Models
Initially, prompt engineering was used: a task was defined and the large model was asked to output the intent embedded in the user request. However, two main problems emerged.
Issues with Prompt Engineering
The model’s output intent sometimes did not match the predefined intent taxonomy.
Only very large models (hundreds of billions of parameters) reliably followed instructions; smaller models tended to answer directly.
Few‑Shot Prompting and Fine‑Tuning
To mitigate these issues, a few‑shot approach was adopted, defining intents in the prompt and providing example queries. This alleviated some problems but increased token length, violating latency constraints. Consequently, fine‑tuning of the large model was explored as a more practical solution.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
