Zhihu Recommendation Page Ranking: Architecture, Feature Engineering, Model Evolution, and Future Directions
This article presents a comprehensive overview of Zhihu's recommendation page ranking system, covering its request flow, historical ranking evolution, feature design, deep learning models, multi‑task CTR optimization, practical engineering insights, current challenges, and future research directions such as reinforcement learning.
This talk, presented by Zhihu ranking algorithm lead Dan Houzhi and organized by DataFun AI Talk, shares the experience of Zhihu's recommendation page ranking.
Key topics include: Zhihu recommendation page scenario and ranking history. Attempts and current status of deep learning in ranking. Existing problems and future research directions.
1. Recommendation page request flow
The pipeline consists of three stages:
Recall – extracts a broad set of candidate items based on user interests (topic‑based or content‑based collaborative filtering). Ranking – scores recalled items using rule‑based (time, linear weighting) or model‑based (GBDT, DNN) methods. Re‑ranking – applies business‑driven adjustments such as promotion, isolation, or strong insertion before final display.
2. Ranking evolution
Four major stages are described: Time‑based sorting. EdgeRank‑style algorithm that incorporates user intimacy. Feed Ranking using GBDT models. Global Ranking based on deep learning (DNN) models.
3. Feature introduction
User profile features (attributes, statistics), content profile features (length, keywords, historical likes), and cross features (user‑topic × content‑topic interactions). Feature formats include numeric, one‑hot, multi‑hot, one‑hot with value, and multi‑hot with value.
4. Feature design principles
Features should be as complete as possible, retain raw values, have high coverage, and maintain consistency between offline training and online serving.
5. New feature directions
Explicit cross features to reduce model search space, business‑driven features (e.g., video click propensity under Wi‑Fi), and embedding‑based features.
5.1 Content embedding
Embedding maps items to a low‑dimensional space where similar items are close. Methods include text‑based (TF‑IDF, Word2Vec) and behavior‑based (session‑based item sequences) using a skip‑gram model with NCE loss.
6. CTR model
The ranking objective can be stay‑time regression or click‑through‑rate (CTR) classification; the latter is a binary classification problem solved with cross‑entropy loss.
6.1 Model structures
Initial DNN: separate user and content blocks, each passed through fully‑connected layers, concatenated, and fed to two more layers with sigmoid output. Optimized DNN: block‑wise feature groups, each with its own hidden layer before concatenation. DeepFM: adds a first‑order and FM module; FM computes inner‑product interactions between blocks, improving AUC by ~0.2%. Last View + DIN: uses attention over topics of the last viewed items to weight current item embeddings. Last Display + GRU: incorporates both clicked and non‑clicked displayed items via GRU before feeding into DNN. Multi‑task learning: shares lower‑layer weights across several objectives (CTR, favorite, like, comment, etc.) with a weighted loss, improving secondary metrics while keeping CTR stable. Final model: combines the above components into a unified architecture.
7. Experience sharing
Key engineering tips: record online statistical features at request time to avoid leakage; ensure offline‑online feature consistency; apply log transformation to large numeric features; check for NaN/Inf; cache user‑side computations; use large training data stored in FlatBuffer on HDFS; keep models auto‑updating.
8. Current challenges
Recommendation differs from search: no explicit query, requiring both relevance and diversity. Pointwise CTR models ignore interactions among items displayed together. User fatigue from repeated similar topics needs diversified recommendations.
9. Future directions
Reinforcement learning with an actor‑critic framework: the actor generates whole‑screen recommendations based on past behavior; the critic receives click/reward signals to jointly train both networks, aiming to capture real‑time feedback and avoid fatigue, though model complexity and training difficulty increase.
Author and recruitment
Dan Houzhi, senior ranking engineer at Zhihu, shares his background and invites interested candidates to contact via the provided email and WeChat QR code.
—END—
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
