How Large Language Models Power Xiaomi’s Xiao AI Assistant
This article explains how large language models are integrated into Xiaomi’s Xiao AI assistant, covering intent distribution, domain‑specific intent understanding, response generation, architectural design, challenges such as knowledge requirements and latency, and the shift from prompt engineering to model fine‑tuning.
Overview of Xiao AI
Xiao AI is an omnipresent AI assistant offering services such as suggestions, voice interaction, visual recognition, translation, and calls across devices including phones, speakers, TVs, and Xiaomi cars.
Why Apply Large Models
Following the ChatGPT wave, many companies, including Xiaomi, explored large‑model integration to improve product experience. Rebuilding Xiao AI with large models increased next‑day user retention by 10% and query satisfaction by 8%.
System Architecture
The overall architecture follows a divide‑and‑conquer approach: a user query first passes through an intent‑distribution large model, which routes the query to domain‑specific agents. Each agent contains its own intent‑understanding model for deeper comprehension.
Large‑Model Intent Distribution
The goal of intent distribution is to identify the intent category of a query and route it to the appropriate domain agent, allowing each agent to focus on a narrow set of intents, which simplifies model iteration and improves efficiency.
Two main challenges arise:
Models need sufficient knowledge to distinguish similar intents (e.g., “open settings” vs. “open air conditioner”).
Latency must stay below 200 ms for a smooth user experience.
Initial Attempts with Prompt Engineering
Early attempts used prompt engineering: defining the task and providing few‑shot examples to coax the large model into extracting intents. This approach faced two problems: the model’s output often mismatched predefined intents, and smaller models struggled to follow instructions, requiring billion‑parameter models.
Moving Toward Fine‑Tuning
To address token‑length and latency constraints of prompt engineering, the team explored fine‑tuning the large model on intent‑distribution data. This reduced reliance on extensive prompts while maintaining accuracy and speed.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
