How Large Language Models Power Xiaomi’s Xiao AI Assistant

This article explains how Xiaomi’s Xiao AI assistant leverages large language models for intent routing, domain‑specific intent understanding, and response generation, detailing the system architecture, challenges such as knowledge requirements and latency constraints, and the shift from prompt engineering to model fine‑tuning.

DataFunTalk
DataFunTalk
DataFunTalk
How Large Language Models Power Xiaomi’s Xiao AI Assistant

Introduction

This article introduces the application of large models in the Xiao AI product, sharing practices in intent routing, intent understanding, and response generation.

Contents

Xiao AI overview

Large model intent routing

Domain‑specific intent understanding

Response generation

Xiao AI Overview

Xiao AI is an omnipresent AI assistant. Its product line includes suggestions, voice, vision, translation, and calls, and it runs on phones, speakers, TVs, and Xiaomi cars.

Impact of Large Models

Since the launch of ChatGPT, the large‑model wave has prompted companies to adopt LLMs. By rebuilding Xiao AI with large models, product experience and user retention improved, with next‑day active‑user retention increasing by 10% and long‑tail query satisfaction rising by 8%.

System Architecture

Xiao AI system architecture
Xiao AI system architecture

When a user query arrives, a large‑model intent routing component first determines the query’s intent and routes it to downstream domain‑specific agents. Each domain agent has its own intent‑understanding large model.

Large Model Intent Routing

The purpose of intent routing is to classify the user query’s intent and forward it to the appropriate domain agent for deeper understanding. This modular approach reduces model iteration difficulty and improves efficiency.

Challenges

The model needs knowledge to correctly interpret intents (e.g., distinguishing "open settings" from "open air conditioner").

Latency must be kept under 200 ms for a responsive Xiao AI experience.

Attempts with Large Models

Initially, prompt engineering was used: a task was defined and the large model was asked to output the intent embedded in the user request. However, two main problems emerged.

Issues with Prompt Engineering

The model’s output intent sometimes did not match the predefined intent taxonomy.

Only very large models (hundreds of billions of parameters) reliably followed instructions; smaller models tended to answer directly.

Few‑Shot Prompting and Fine‑Tuning

To mitigate these issues, a few‑shot approach was adopted, defining intents in the prompt and providing example queries. This alleviated some problems but increased token length, violating latency constraints. Consequently, fine‑tuning of the large model was explored as a more practical solution.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Prompt Engineeringlarge language modelsmodel fine-tuningAI AssistantIntent Routing
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.