Artificial Intelligence 22 min read

Intelligent Question Answering in QQ Browser Search: Background, Key Technologies, and Frontier Research

This article presents an in‑depth overview of intelligent question answering in QQ Browser search, covering its background, the core KBQA and DeepQA technologies, system architecture, challenges, recent advances such as end‑to‑end, knowledge‑guided and multimodal QA, and practical Q&A for deployment.

DataFunTalk
DataFunTalk
DataFunTalk
Intelligent Question Answering in QQ Browser Search: Background, Key Technologies, and Frontier Research

Intelligent question answering (QA) is a prominent AI direction widely used in vertical and general search engines, smart customer service, assistants, smartphones, and car speakers.

Background : In search, QA appears as two main product forms – direct answer cards (Top‑1 QA) and interactive QA that helps users refine or extend their queries. Various answer types (short, long, list, video, collection, image) are shown in the QQ Browser search UI.

Key Technologies :

Search QA framework – the system processes both structured (knowledge graph) and unstructured (web pages, UGC, PGC) data.

KBQA (knowledge‑graph QA): parses the query, performs graph traversal or triple matching, and uses rule‑based or deep‑learning pipelines to answer factual questions.

DeepQA (search + machine reading comprehension): retrieves candidate documents, applies MRC models to extract answers, and supports short‑answer, long‑answer, and judgment‑type QA. Three DeepQA systems are built: independent retrieval, web‑wide retrieval + online MRC, and end‑to‑end QA.

KBQA Details : Two pipelines – semantic parsing (domain classification, syntax tree, logical form) for complex reasoning, and a deep‑learning pipeline that detects intent, extracts entities, and performs semantic matching with knowledge‑graph triples.

DeepQA Details :

Early systems (e.g., IBM Watson) had complex pipelines; later open‑domain systems like DrQA introduced a retrieve‑then‑read paradigm.

Challenges in real‑world search: noisy results, diverse document formats, answer normalization, multi‑span answers.

Short‑answer MRC: joint training of answer existence and span prediction, multi‑document interaction, R‑Drop regularization, external knowledge tagging.

Long‑answer MRC: "compositional QA" that selects multiple snippets to form a concise summary; incorporates page‑structure tokens, specialized pre‑training tasks (question selection, node selection), and graph‑network layers.

Answer‑type judgment QA: extracts a long evidence passage then classifies the stance to produce a short verdict.

Dense Passage Retrieval : A dual‑tower model learns query‑passage semantic embeddings from massive web logs and QA pairs; Barlow‑Twins reduces redundancy in vectors; cross‑batch and hybrid denoising negative sampling improve training efficiency and quality.

Query‑Passage Interaction Model : After a fast non‑interactive recall, a more expensive interactive re‑ranker refines candidate ordering using multi‑stage pre‑training, weak supervision from click logs, and self‑training on high‑confidence auto‑labels.

Frontier Research :

End‑to‑end QA – three generations: retrieve‑then‑read, jointly optimized retriever‑reader, and fully generative models (T5, GPT‑3). Hard‑EM aligns answer hard‑matches with document retrieval.

Knowledge‑guided QA – integrating entity triples via soft position encoding (K‑BERT) or multi‑task knowledge‑enhanced pre‑training with a frozen base model and a learnable knowledge memory matrix.

Multimodal QA – converting video audio and subtitles to text, then applying reading‑comprehension + generation to produce video‑answer summaries and key‑frame tags.

Sample Q&A :

Q1: Execution order of KBQA and DeepQA? A: They run in parallel; a downstream decision module selects the final answer.

Q2: How to speed up DeepQA in production? A: Use a hierarchical retrieval pipeline (fast non‑interactive recall → slower interactive re‑rank) and limit MRC to top‑N documents.

Q3: When and how is query spelling correction triggered? A: At the earliest query‑analysis stage; high‑confidence corrections may replace the query directly, otherwise a second‑pass search merges original and corrected results.

Thank you for your attention.

aiDeep LearningMultimodalKnowledge Graphdense retrievalSearchquestion answering
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.