Graph Model Practices and Applications in Baidu Recommendation System
This article introduces the background of graph data, explains common graph modeling algorithms such as graph embedding and graph neural networks, compares their strengths, and details the evolution and large‑scale deployment of Feed graph models in Baidu's recommendation platform.
Overview – The presentation consists of three parts: an introduction to graph background, a review of common graph model algorithms, and the evolution history of Feed graph models used in Baidu's recommendation system.
1. Graph Background – Graphs are a common language for representing complex data such as social networks, molecular structures, knowledge graphs, advertising, and maps. Typical graph tasks include node classification, link prediction, community detection, network similarity, and graph similarity. The talk focuses on node classification and link prediction for recommendation scenarios.
2. Common Graph Model Algorithms
Graph models aim to project high‑dimensional nodes into a low‑dimensional space while preserving relational information. Two major families dominate:
Walk‑based graph embedding (GE) – generates random walks and encodes each node with a shallow encoder, following the word2vec principle. Popular variants include DeepWalk, Node2Vec, and Metapath2Vec.
Message‑passing graph neural networks (GNN) – use deep encoders that aggregate multi‑hop neighbor information. Typical variants are GraphSAGE, GCN, and GAT, differing in how they sample neighbors and combine messages.
3. Algorithm Comparison – Graph embedding uses shallow encoders, offering high efficiency but limited expressive power and generalization. GNNs employ deep encoders, providing stronger representation ability and better generalization at the cost of higher computational overhead.
4. Feed Graph Model Evolution
The Feed graph model has undergone three stages:
Stage 1 – Item2Vec/User2Vec: users are represented by the centroid of their immediate clicked items.
Stage 2 – Siamese network with metric learning, still using first‑order neighbors but adding supervised similarity learning.
Stage 3 – Graph‑embedding‑based link prediction, injecting high‑order connections to improve overall connectivity.
Traditional representation learning suffers from (1) the need for multiple heuristic tasks to model different relations, and (2) loss of high‑order information when only first‑order neighbors are used.
5. Model Promotion
The graph model was first deployed in Feed recall in 2019 and later extended to multiple tasks and modalities (text, video, mini‑programs). It supports multi‑task learning across various scenes, maps heterogeneous resources (users, items, queries) into a unified vector space, and addresses cold‑start and popularity bias via side‑information features and debiasing strategies.
Subsequent promotion phases include:
Phase 1 – Heterogeneous graph of "good‑looking" user content and ads, using skip‑gram to learn embeddings.
Phase 2 – Migrating user‑content clicks to ad clicks, improving data purity.
Phase 3 – Applying the graph to video ad recall, enriching user behavior signals.
Phase 4 – Introducing generalized attribute features for users, ads, and content to enhance representation and generalization.
In the Tieba scenario, metapath‑based walks (e.g., post‑forum‑post, user‑post‑forum‑post‑user) were added to better exploit node‑type information, yielding higher precision and recall.
Overall, the talk demonstrates how graph‑based methods can model multi‑relation, high‑order interactions in large‑scale recommendation systems, achieving significant performance gains.
Thank you for your attention.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.