Artificial Intelligence 23 min read

Intelligent Question Answering in QQ Browser Search Engine: KBQA, DeepQA, and IRQA

This article presents the architecture, techniques, and practical solutions behind intelligent question answering in QQ Browser's search engine, covering knowledge‑graph based QA (KBQA), machine‑reading‑comprehension QA (DeepQA), and information‑retrieval QA (IRQA), and discusses system design, model optimization, and future directions.

DataFunTalk
DataFunTalk
DataFunTalk
Intelligent Question Answering in QQ Browser Search Engine: KBQA, DeepQA, and IRQA

In recent years, the rapid evolution of search, voice interaction, and smart customer service has greatly expanded the application scenarios of question answering (QA) technology. This talk introduces how intelligent QA is integrated into the QQ Browser search engine to meet user intent more precisely and accelerate the engine’s intelligent upgrade.

The search engine has progressed from manual classification to text retrieval, integrated analysis, and now to the fourth‑generation of intelligent search, which relies on machine‑learning and NLP to deliver comprehensive, timely, and fine‑grained results, including structured knowledge and multimodal content.

Search queries can be divided into navigation, resource, and information demands, with information‑type queries—especially QA—accounting for roughly 25‑30% of total searches. Various QA forms appear, such as plain text answers, list‑type answers, domain‑specific (medical, legal) answers, structured data answers, and short factual answers.

The system is organized into three technical lines:

KBQA: knowledge‑graph based reasoning, using structured inference and end‑to‑end neural approaches to answer queries like “Where is the Eiffel Tower?”.

DeepQA: machine‑reading‑comprehension (MRC) that extracts answers from web text, handling diverse scenarios such as voice assistants and e‑commerce customer service.

IRQA: retrieval‑based QA that leverages large FAQ collections (UGC and PGC) to match queries with pre‑written answers.

KBQA – The QQ Browser knowledge graph contains billions of SPO triples across many domains. Given its scale, a structured reasoning pipeline was chosen, consisting of Query parsing (mention detection, nested template matching), Operator engine (generating and executing operator chains), Graph engine (Neo4j and inverted index), and Ranking. The pipeline handles short user queries, simple SPO lookups, occasional nested queries, and requires high interpretability and flexible customization.

DeepQA – The main challenges are query‑document understanding, answer extraction, and answer selection. The end‑to‑end pipeline includes QU (intent filtering and type classification), Recall (top‑N search plus FAQ and IRQA recall), Coarse ranking (paragraph splitting and relevance filtering), Extraction (MRC model “MoTian” trained on search‑domain data), Answer fusion (aggregating answers across paragraphs), Ranking (feature‑based scoring), and Summary generation. Model improvements involve second‑stage domain pre‑training, training‑process tricks (DropPooler, layer‑wise LR decay), data augmentation, and adversarial training (FGM, SMART).

IRQA – Unlike DeepQA, IRQA relies on existing QA content from community UGC and vertical PGC sources. The online system performs multi‑path recall (keyword, semantic vectors), matching (semantic relevance scoring), and ranking (relevance, timeliness, confidence). Offline, a massive FAQ library is built, filtered through >40 quality‑control plugins (spam, dead links, relevance, timeliness, etc.). Relevance calculation evolved from handcrafted 33‑dimensional features + XGBoost, to interactive BiMPM‑style models, and finally to domain‑adapted pre‑trained models (ELECTRA, MoTian). To meet billion‑scale QPS, a two‑stage distillation (TinyBERT + FastBERT early‑exit) yields a 23× speedup with negligible loss.

The team reflects on two key points: (1) IRQA heavily depends on the quality and volume of external QA content, suggesting stronger collaboration with vertical PGC platforms; (2) DeepQA sometimes suffers from factual inconsistency due to duplicated erroneous sources, indicating a need for tighter integration between KB and DQA to cross‑validate answers.

A short Q&A session addressed practical details such as feature construction for XGBoost, time‑sensitivity labeling, and differences between the pre‑trained models used for query‑level versus document‑level tasks.

The presentation concluded with thanks to the audience and an invitation to join the DataFunTalk community for further AI and big‑data discussions.

AISearch EngineNatural Language Processinginformation retrievalKnowledge Graphquestion answeringmachine reading comprehension
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.