Zhihu Recommendation Page Ranking: Architecture, Feature Engineering, Model Evolution, and Future Directions

This article presents a comprehensive overview of Zhihu's recommendation page ranking system, covering its request flow, historical ranking evolution, feature design, deep learning models, multi‑task CTR optimization, practical engineering insights, current challenges, and future research directions such as reinforcement learning.

DataFunTalk
DataFunTalk
DataFunTalk
Zhihu Recommendation Page Ranking: Architecture, Feature Engineering, Model Evolution, and Future Directions

This talk, presented by Zhihu ranking algorithm lead Dan Houzhi and organized by DataFun AI Talk, shares the experience of Zhihu's recommendation page ranking.

Key topics include: Zhihu recommendation page scenario and ranking history. Attempts and current status of deep learning in ranking. Existing problems and future research directions.

1. Recommendation page request flow

The pipeline consists of three stages:

Recall – extracts a broad set of candidate items based on user interests (topic‑based or content‑based collaborative filtering). Ranking – scores recalled items using rule‑based (time, linear weighting) or model‑based (GBDT, DNN) methods. Re‑ranking – applies business‑driven adjustments such as promotion, isolation, or strong insertion before final display.

2. Ranking evolution

Four major stages are described: Time‑based sorting. EdgeRank‑style algorithm that incorporates user intimacy. Feed Ranking using GBDT models. Global Ranking based on deep learning (DNN) models.

3. Feature introduction

User profile features (attributes, statistics), content profile features (length, keywords, historical likes), and cross features (user‑topic × content‑topic interactions). Feature formats include numeric, one‑hot, multi‑hot, one‑hot with value, and multi‑hot with value.

4. Feature design principles

Features should be as complete as possible, retain raw values, have high coverage, and maintain consistency between offline training and online serving.

5. New feature directions

Explicit cross features to reduce model search space, business‑driven features (e.g., video click propensity under Wi‑Fi), and embedding‑based features.

5.1 Content embedding

Embedding maps items to a low‑dimensional space where similar items are close. Methods include text‑based (TF‑IDF, Word2Vec) and behavior‑based (session‑based item sequences) using a skip‑gram model with NCE loss.

6. CTR model

The ranking objective can be stay‑time regression or click‑through‑rate (CTR) classification; the latter is a binary classification problem solved with cross‑entropy loss.

6.1 Model structures

Initial DNN: separate user and content blocks, each passed through fully‑connected layers, concatenated, and fed to two more layers with sigmoid output. Optimized DNN: block‑wise feature groups, each with its own hidden layer before concatenation. DeepFM: adds a first‑order and FM module; FM computes inner‑product interactions between blocks, improving AUC by ~0.2%. Last View + DIN: uses attention over topics of the last viewed items to weight current item embeddings. Last Display + GRU: incorporates both clicked and non‑clicked displayed items via GRU before feeding into DNN. Multi‑task learning: shares lower‑layer weights across several objectives (CTR, favorite, like, comment, etc.) with a weighted loss, improving secondary metrics while keeping CTR stable. Final model: combines the above components into a unified architecture.

7. Experience sharing

Key engineering tips: record online statistical features at request time to avoid leakage; ensure offline‑online feature consistency; apply log transformation to large numeric features; check for NaN/Inf; cache user‑side computations; use large training data stored in FlatBuffer on HDFS; keep models auto‑updating.

8. Current challenges

Recommendation differs from search: no explicit query, requiring both relevance and diversity. Pointwise CTR models ignore interactions among items displayed together. User fatigue from repeated similar topics needs diversified recommendations.

9. Future directions

Reinforcement learning with an actor‑critic framework: the actor generates whole‑screen recommendations based on past behavior; the critic receives click/reward signals to jointly train both networks, aiming to capture real‑time feedback and avoid fatigue, though model complexity and training difficulty increase.

Author and recruitment

Dan Houzhi, senior ranking engineer at Zhihu, shares his background and invites interested candidates to contact via the provided email and WeChat QR code.

—END—

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

recommendationCTRrankingmulti-task learning
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.