Graph Models in Baidu's Recommendation System: Foundations, Algorithms, and Feed Model Evolution
This article explains how graph models are applied in Baidu's recommendation system, covering graph basics, common graph algorithms such as graph embeddings and GNNs, and the evolution and deployment of the Feed graph model across multiple product lines.
Introduction
The presentation introduces three parts: an overview of graph background, commonly used graph algorithms, and the evolution history of the Feed graph model in Baidu's recommendation system.
Graph Background
Graphs are a common language for describing complex data such as social networks, molecular structures, knowledge graphs, advertising, and maps. Typical graph tasks include node classification, link prediction, community detection, network similarity, and graph similarity, with the talk focusing on node classification and link prediction for recommendation.
Common Graph Algorithms
Graph modeling projects high‑dimensional nodes into a low‑dimensional space while preserving relationships. Main algorithm families are walk‑based graph embedding (GE) and message‑passing graph neural networks (GNN).
Graph Embedding (GE): Generates random walks, encodes each node with a shallow encoder, similar to word2vec. Variants include DeepWalk (unbiased random walk), Node2Vec (adjustable p and q for BFS/DFS bias), and Metapath2Vec (handles heterogeneous graphs with predefined meta‑paths).
Graph Neural Networks (GNN): Uses deep encoders and message passing to aggregate multi‑hop neighbor information. Popular variants include GraphSAGE (sampling + mean/min/max/LSTM aggregation), GCN (degree‑based weighting), and GAT (attention mechanism).
GE offers high efficiency with shallow encoders but limited expressiveness, while GNN provides stronger representation power at higher computational cost.
Feed Graph Model Evolution
The Feed graph model progressed through three stages: (1) Item2Vec/User2Vec using first‑order neighbors; (2) a Siamese network adding supervised metric learning; (3) graph‑embedding‑based link prediction incorporating high‑order connections.
Traditional representation learning suffers from needing multiple heuristic tasks for different relations and losing high‑order information when only first‑order neighbors are used.
To address cold‑start and low‑frequency resource issues, side‑information features and popularity‑based debiasing are introduced, and multiple node types (users, items, queries) are embedded into a unified space, supporting various recall modes such as UserCF, ItemCF, and Lookalike.
Model Promotion
The model was later extended to native ads and Tieba, undergoing four phases: (1) using heterogeneous graphs of news and ads with skip‑gram embeddings; (2) focusing on ad clicks only for cleaner training data; (3) adding more user behavior signals (search, video, etc.) for richer graphs; (4) incorporating generalized attribute features for better generalization on new users and ads.
In Tieba, meta‑paths like post‑forum‑post and user‑post‑forum‑post‑user were designed to leverage node type information, improving accuracy and recall.
Overall, the talk demonstrates how graph models enhance Baidu's recommendation pipelines by capturing multi‑relational, high‑order information and scaling across diverse product lines.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.