Graph Models in Baidu Recommendation System: Background, Algorithms, and Evolution
This article introduces the use of graph models in Baidu's recommendation system, covering graph fundamentals, common graph algorithms such as graph embedding and graph neural networks, the evolution of the Feed graph model, and its subsequent promotion across multiple product lines.
Overview – The presentation is divided into three parts: an introduction to graph background, a review of commonly used graph algorithms, and the evolution history of the Feed graph model within Baidu's recommendation system.
Graph Background – Graphs are a versatile language for representing complex data such as social networks, knowledge graphs, and advertising. Typical graph tasks include node classification, link prediction, community detection, network similarity, and graph similarity, with the talk focusing on node classification and link prediction for recommendation.
Common Graph Algorithms – Graph algorithms fall into two main families: walk‑based graph embedding (GE) and message‑passing‑based graph neural networks (GNN). GE methods (e.g., DeepWalk, Node2Vec, Metapath2Vec) generate random walks and use shallow encoders to learn low‑dimensional node embeddings. GNNs use deep encoders that aggregate multi‑hop neighbor information, with variants such as GraphSAGE, GCN, and GAT that differ in neighbor aggregation and attention mechanisms.
Algorithm Comparison – Graph embedding offers high efficiency with shallow encoders but limited expressive power, while GNNs provide stronger representation and generalization at the cost of higher computational overhead.
Feed Graph Model Evolution – The Feed model progressed through three stages: (1) Item2Vec/User2Vec using first‑order neighbors, (2) a Siamese network adding supervised similarity learning, and (3) a graph‑embedding‑based approach that incorporates high‑order connections for link prediction.
Challenges and Solutions – Traditional methods suffer from multiple heuristic tasks for different relations and loss of high‑order information. Introducing side‑information features and popularity‑aware walk probabilities mitigates cold‑start and sparsity issues, while multi‑task learning unifies various recall modes.
Model Promotion – The graph model was later extended to native ads and Tieba. Four promotion phases refined the graph construction: (1) using heterogeneous user‑news‑ad graphs, (2) focusing on pure ad interactions, (3) adding richer user behavior signals, and (4) incorporating generalized attribute embeddings for better generalization. Metapath2Vec strategies were employed in Tieba to leverage node‑type information.
Conclusion – The session summarized the practical application of graph models in Baidu's Feed recommendation, highlighting their impact on recall performance and future directions.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.