Artificial Intelligence 5 min read

How Large Language Models Power Xiaomi’s Xiao AI Assistant

This article explains how Xiaomi’s Xiao AI assistant leverages large language models for intent routing, domain‑specific intent understanding, and response generation, detailing the system architecture, challenges such as knowledge requirements and latency constraints, and the shift from prompt engineering to model fine‑tuning.

DataFunTalk

Oct 22, 2025

How Large Language Models Power Xiaomi’s Xiao AI Assistant

Introduction

This article introduces the application of large models in the Xiao AI product, sharing practices in intent routing, intent understanding, and response generation.

Xiao AI overview

Large model intent routing

Domain‑specific intent understanding

Response generation

Xiao AI Overview

Xiao AI is an omnipresent AI assistant. Its product line includes suggestions, voice, vision, translation, and calls, and it runs on phones, speakers, TVs, and Xiaomi cars.

Impact of Large Models

Since the launch of ChatGPT, the large‑model wave has prompted companies to adopt LLMs. By rebuilding Xiao AI with large models, product experience and user retention improved, with next‑day active‑user retention increasing by 10% and long‑tail query satisfaction rising by 8%.

System Architecture

When a user query arrives, a large‑model intent routing component first determines the query’s intent and routes it to downstream domain‑specific agents. Each domain agent has its own intent‑understanding large model.

Large Model Intent Routing

The purpose of intent routing is to classify the user query’s intent and forward it to the appropriate domain agent for deeper understanding. This modular approach reduces model iteration difficulty and improves efficiency.

Challenges

The model needs knowledge to correctly interpret intents (e.g., distinguishing "open settings" from "open air conditioner").

Latency must be kept under 200 ms for a responsive Xiao AI experience.

Attempts with Large Models

Initially, prompt engineering was used: a task was defined and the large model was asked to output the intent embedded in the user request. However, two main problems emerged.

Issues with Prompt Engineering

The model’s output intent sometimes did not match the predefined intent taxonomy.

Only very large models (hundreds of billions of parameters) reliably followed instructions; smaller models tended to answer directly.

Few‑Shot Prompting and Fine‑Tuning

To mitigate these issues, a few‑shot approach was adopted, defining intents in the prompt and providing example queries. This alleviated some problems but increased token length, violating latency constraints. Consequently, fine‑tuning of the large model was explored as a more practical solution.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Prompt engineering Large Language Models model fine-tuning AI assistant Intent Routing

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Introduction

Contents

Xiao AI Overview

Impact of Large Models

System Architecture

Large Model Intent Routing

Challenges

Attempts with Large Models

Issues with Prompt Engineering

Few‑Shot Prompting and Fine‑Tuning

DataFunTalk

How this landed with the community

Was this worth your time?

0 Comments