Xiaomi Xiao AI Intelligent Question‑Answering System: Architecture, Techniques, and Applications
This article presents a comprehensive overview of Xiaomi's Xiao AI intelligent QA system, detailing its background, three core answering modules—knowledge‑graph QA, retrieval‑based FAQ, and reading‑comprehension—and the underlying methods such as template matching, cross‑domain semantic parsing, path‑based reasoning, semantic retrieval, and neural matching, while also discussing performance results and practical trade‑offs.
Speaker: Daï Wen, Ph.D., Senior Algorithm Engineer at Xiaomi.
Editor: Gao Wenlong, 58.com.
Platform: DataFunTalk.
Background : In Xiaomi’s phone + AIoT strategy, Xiao AI serves as the AIoT entry point, providing convenient services across smartphones, AI speakers, smart wearables, smart cars, TVs, and children’s devices. It offers six major service categories: content, information query, interaction, control, life services, and basic tools.
The intelligent QA system focuses on factual queries that require objective answers from a large knowledge base.
01 Xiao AI Intelligent QA System Background
Three QA modules are deployed:
Knowledge‑graph QA (graph‑based)
Retrieval‑based FAQ (search‑based)
Reading‑comprehension QA (span‑based)
02 Knowledge‑graph QA
The knowledge graph built by Xiaomi AI Lab covers domains such as books, local life, poetry, real estate, products, music, and personalities.
Three main approaches are used:
Template‑based method : Find the most similar template from a template library, parse the query according to the template, and retrieve the answer from the graph. Templates are collected from high‑frequency online queries and can be expanded automatically. Example of automatic template extraction: if a Q‑A pair contains the entity “Li Bai” and the answer contains “Tang dynasty”, a template like " xxx是什么朝代的 " is generated. Complex queries with constraints (e.g., "the tallest cat") are handled by extracting ordinal words and adding sorting constraints.
Cross‑domain coarse‑grained semantic parsing : Joint intent detection and slot filling using a sequential network with a CRF layer. The model predicts both intent and slots, improving recall and generalization. Slots and intents are abstracted across domains (e.g., poet → person, poem → work) to reduce annotation cost.
Path‑matching method : For multi‑step reasoning queries, candidate paths are mined from the graph, then matched and ranked. The process includes entity linking, sub‑graph retrieval, and sub‑graph matching. Example SPARQL queries are wrapped in tags: select ?x where <姚明> <配偶> ?x select ?y where <姚明> <配偶> ?x ?x <身高> ?y Constraints such as numeric filters or ordinal sorting are added to the paths (e.g., height > 200 cm). Matching converts sub‑graphs to textual descriptions (e.g., "姚明配偶身高^") and trains a sentence‑pair matching model.
The graph‑QA approach achieved 3rd place in CCKS 2020 COVID‑19 KG QA and 1st place in CCKS 2021 life‑service KG QA.
03 Retrieval‑based FAQ
FAQ QA handles unstructured queries (why, how, whether) by retrieving similar Q‑A pairs from a large offline QA library.
Retrieval pipeline:
Multi‑channel recall: term‑based, entity‑based (using entity IDs), and semantic‑based (embedding similarity via a twin network).
Semantic retrieval uses ANN algorithms. IVF clusters vectors, IVFPQ compresses them, and HNSW provides hierarchical nearest‑neighbor search. The chosen solution is IVFPQ, offering ~1 ms latency with 188 MB memory for 10 M documents.
Matching stage:
Traditional feature‑based models vs. deep models (representation‑based like DSSM, interaction‑based like ESIM, and large pre‑trained language models).
Training in two stages: coarse training on user behavior logs (likes, clicks) to generate noisy but massive samples, followed by fine‑tuning on high‑quality manually annotated data.
Data augmentation creates additional positive/negative pairs by finding similar queries.
Additional keyword features are concatenated with PLM embeddings to mitigate semantic focus errors.
04 Reading‑Comprehension QA
A span‑based reading‑comprehension model is used only for specific query types when both graph‑QA and retrieval‑QA fail. For example, answering "What is Lu Xun’s surname?" by extracting the answer from Lu Xun’s encyclopedia passage.
05 Summary
The system combines three QA technologies:
KBQA : high accuracy, good experience, limited recall – covers head queries.
FAQ : broad coverage, good generalization, variable data quality – serves as a fallback for long‑tail queries.
Reading‑Comprehension : enhances user experience but lower accuracy – expands answering capability.
Thank you for listening.
Follow, like, and give a "3‑click" boost at the end of the article.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.