Zhihu Search Text Relevance Evolution and BERT Knowledge Distillation Practices
This talk by Zhihu search algorithm engineer Shen Zhan details the evolution of text relevance models from TF‑IDF/BM25 to deep semantic matching and BERT, explains the challenges of deploying BERT at scale, and describes practical knowledge‑distillation techniques that improve both online latency and offline storage while maintaining search quality.
The presentation begins with an overview of Zhihu's search text relevance, defining relevance as the match between user query intent and retrieved document content, and distinguishing between literal matching and semantic relevance.
It then traces the evolution of relevance models in three stages: (1) early bag‑of‑words approaches using TF‑IDF/BM25, (2) deep semantic matching models such as dual‑tower encoders (e.g., DSSM) and interaction models (e.g., Match‑Pyramid, KNRM), and (3) the adoption of BERT, which provides powerful contextual representations for both representation and interaction models.
The speaker discusses the practical deployment of BERT in Zhihu's search pipeline, noting the high computational cost of interaction models and the trade‑off of using representation models with offline‑precomputed document vectors and online query encoding.
Knowledge distillation is introduced as a solution to reduce model size and latency. The talk explains the concept of soft targets and temperature scaling, and reviews common distillation schemes, including MiniLM and teacher‑student frameworks.
Specific distillation experiments are described: using larger teacher models (e.g., RoBERTa‑large) to train a 6‑layer student model, applying Patient KD with combined cross‑entropy and normalized MSE losses, and compressing vector dimensions from 768 to 64 while preserving retrieval performance.
Results show significant gains: online latency reduced by ~40 ms, GPU usage halved, storage for semantic indexes cut by up to 75 %, and offline indexing time reduced to one‑quarter, all with minimal loss in relevance metrics (nDCG comparable to or better than the original BERT‑base).
The session concludes with a summary of the benefits of BERT distillation for both online serving and offline indexing, and thanks the audience.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.