Artificial Intelligence 12 min read

Zhihu Recommendation Page Ranking: Architecture, Feature Design, Model Evolution, and Practical Insights

This article presents a comprehensive overview of Zhihu's recommendation page ranking system, detailing the request flow, ranking evolution from time‑based to deep‑learning models, feature engineering strategies, model architectures such as DNN, DeepFM, DIN, multi‑task learning, and lessons learned for production deployment.

DataFunSummit

Aug 29, 2021

Zhihu Recommendation Page Ranking: Architecture, Feature Design, Model Evolution, and Practical Insights

The article introduces Zhihu's recommendation page and the history of its ranking system, outlining the three‑stage request flow: recall (topic‑based and content‑based), ranking (rule‑based and model‑based using GBDT and DNN), and re‑ranking for product considerations.

It describes the evolution of ranking algorithms through four stages: simple time‑ordering, EdgeRank inspired by Facebook, Feed Ranking using GBDT, and Global Ranking employing deep‑learning models (DNN, embedding).

Feature engineering is divided into three categories—user profile, content profile, and cross features—with various representations such as numeric, one‑hot, multi‑hot, and value‑weighted encodings. Design principles emphasize feature completeness, raw value preservation, high coverage, and consistency between online and offline pipelines.

New feature directions include explicit cross features, business‑driven insights, and content embeddings derived from skip‑gram models trained on massive search‑behavior sessions (85 billion samples) using nce‑loss.

The CTR model uses cross‑entropy loss with a sigmoid output. Initial DNN architecture separates user and content blocks, concatenates hidden layers, and feeds them into fully connected layers, achieving an AUC of 0.7618. Subsequent optimizations—block‑wise DNN, DeepFM (adding FM and first‑order terms), DIN with topic‑based attention, GRU‑based Last Display, and multi‑task learning—incrementally improve AUC up to 0.7678.

Practical experience sharing covers handling time‑varying statistical features, ensuring offline‑online feature consistency, feature normalization (log‑scaling large values), illegal value checks, caching user‑side computations, large‑scale data handling with FlatBuffer on HDFS, and automated model updates.

Challenges identified include differences between recommendation and search pages, pointwise CTR models ignoring item interactions, and user fatigue from repetitive content. Future directions propose reinforcement learning with actor‑critic architecture to capture user feedback and provide whole‑screen recommendations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning feature engineering recommendation CTR Ranking zhihu

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.