Artificial Intelligence 28 min read

Search Relevance Architecture and Practices in QQ Browser

The QQ Browser search relevance team describes a unified, billion‑scale architecture that combines a main and vertical subsystem, a pyramid‑shaped ranking pipeline (recall, coarse, fine), a dedicated GPU‑accelerated relevance service, and hybrid semantic‑matching models (dual‑tower, BERT, matrix fusion) evaluated with offline and online metrics to deliver accurate, fresh, and authoritative results for diverse content and long‑tail queries.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Search Relevance Architecture and Practices in QQ Browser

Search relevance measures the match between a query (Query) and a document (Doc) and is a core task of information retrieval. This article, authored by Liu Jie, introduces the practical experience of the QQ Browser search relevance team, covering system architecture, algorithm design, and the integration of QQ Browser and Sogou search systems.

Business Overview : QQ Browser’s search is a comprehensive web search service serving billions of users daily, returning not only traditional web pages and images but also rich media such as mini‑programs, WeChat articles, video cards, and intelligent Q&A.

System Framework : The platform consists of two major subsystems – the Main Search subsystem (evolved from Sogou) and the General Vertical Search subsystem (evolved from Kankan). Both subsystems perform retrieval on a billion‑scale index and feed results to a top‑level fusion ranking.

The logical layers are:

Fusion system: whole‑page heterogeneous ranking of natural results and vertical‑specific cards, including click‑prediction and lightweight re‑ranking.

General vertical search subsystem: fast‑deployment, high‑iteration vertical services with a focus on specialized result types.

Main search subsystem: stable, large‑scale retrieval of traditional web pages and images, focusing on long‑tail user queries.

Algorithm Architecture : The ranking pipeline follows a pyramid‑shaped funnel:

Recall layer : textual recall via inverted index and vector recall via deep models that map Query and Doc into a latent space.

Coarse‑ranking layer : low‑cost feature extraction, including relevance features (textual and semantic), static Query/Doc features, and statistical user‑behavior features.

Fine‑ranking layer : multi‑objective scoring (relevance, freshness, authority, click‑prediction) to produce the final ordered list.

The relevance computation is split into coarse relevance (used for thousands‑to‑hundreds filtering) and fine relevance (used for hundreds‑to‑single filtering). Coarse relevance relies mainly on inverted‑index features and a dual‑tower semantic model; fine relevance incorporates richer Doc‑side features and higher‑cost models.

Evaluation System : Both offline and online metrics are used.

PNR (Positive‑Negative Ratio) – a pairwise metric counting correctly ordered pairs.

DCG (Discounted Cumulative Gain) – a listwise metric rewarding higher relevance at top positions.

Interleaving – online click‑preference experiments that interleave two ranking lists.

GSB (Good‑Same‑Bad) – expert side‑by‑side labeling of two ranking outputs.

System Evolution :

1.0 era (fragmented systems) – faced challenges of incomparable relevance scores and different modeling philosophies ("big‑unified" vs. "abstract high‑level features").

2.0 era – unified relevance service with 90% code reuse, providing calibrated relevance scores, GPU‑accelerated feature computation, and reduced experiment latency (from weeks to days).

Relevance Service : A dedicated side‑service for fine‑ranking that delivers high‑level relevance scores, supports GPU parallelism, and separates relevance from the recall layer, enabling faster experimentation and more stable feature management.

Deep Semantic Matching :

Two matching paradigms: representation‑based (dual‑tower cosine similarity) and interaction‑based (BERT with CLS token).

Ranking loss strategies: pointwise (5‑class relevance), pairwise (order‑preserving), and hybrid approaches combining both.

Calibration of semantic features (Probability Calibration) to make scores comparable across queries and business lines.

Relevance vs. Semantic Matching : Relevance Matching emphasizes exact keyword hits and core‑term identification, while Semantic Matching focuses on overall term/phrase similarity. Both are needed for robust search.

Hybrid Matrix Matching (HMM) :

Constructs an implicit semantic matching matrix from BERT token embeddings (dense + cosine) and an explicit exact‑match matrix from tokenized Query‑Title hits.

Uses 3D‑CNN with weighted‑sum pooling to fuse both matrices, then concatenates with BERT CLS vector for final scoring.

Offline experiments show 1.8%‑2.3% improvements in PNR/NDCG; online A/B tests and interleaving confirm significant gains.

Conclusion : Search relevance in QQ Browser is a technically demanding area involving massive scale, diverse content types, and long‑tail queries. The team has unified two legacy systems, built a high‑availability relevance service, and continuously incorporated state‑of‑the‑art AI techniques (pre‑trained language models, domain‑adapted training, calibrated ranking losses) to improve user experience.

system architecturedeep learningevaluation metricsinformation retrievalranking algorithmsSearch Relevance
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.