Artificial Intelligence 5 min read

How Large Language Models Power Xiaomi’s Xiao AI Assistant

This article explains how large language models are integrated into Xiaomi’s Xiao AI assistant, covering intent distribution, domain‑specific intent understanding, response generation, architectural design, challenges such as knowledge requirements and latency, and the shift from prompt engineering to model fine‑tuning.

DataFunTalk

Oct 10, 2025

How Large Language Models Power Xiaomi’s Xiao AI Assistant

Overview of Xiao AI

Xiao AI is an omnipresent AI assistant offering services such as suggestions, voice interaction, visual recognition, translation, and calls across devices including phones, speakers, TVs, and Xiaomi cars.

Why Apply Large Models

Following the ChatGPT wave, many companies, including Xiaomi, explored large‑model integration to improve product experience. Rebuilding Xiao AI with large models increased next‑day user retention by 10% and query satisfaction by 8%.

System Architecture

The overall architecture follows a divide‑and‑conquer approach: a user query first passes through an intent‑distribution large model, which routes the query to domain‑specific agents. Each agent contains its own intent‑understanding model for deeper comprehension.

Large‑Model Intent Distribution

The goal of intent distribution is to identify the intent category of a query and route it to the appropriate domain agent, allowing each agent to focus on a narrow set of intents, which simplifies model iteration and improves efficiency.

Two main challenges arise:

Models need sufficient knowledge to distinguish similar intents (e.g., “open settings” vs. “open air conditioner”).

Latency must stay below 200 ms for a smooth user experience.

Initial Attempts with Prompt Engineering

Early attempts used prompt engineering: defining the task and providing few‑shot examples to coax the large model into extracting intents. This approach faced two problems: the model’s output often mismatched predefined intents, and smaller models struggled to follow instructions, requiring billion‑parameter models.

Moving Toward Fine‑Tuning

To address token‑length and latency constraints of prompt engineering, the team explored fine‑tuning the large model on intent‑distribution data. This reduced reliance on extensive prompts while maintaining accuracy and speed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Prompt engineering Large Language Models model fine-tuning AI assistant Intent Routing Xiao AI

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.