Graph Neural Networks for Recommendation: Principles, Frameworks, and Tencent Practice
This article introduces graph neural networks, explains their fundamentals and GraphSAGE/DGI algorithms, and demonstrates how Tencent applies them to recommendation scenarios such as video and WeChat content, highlighting network construction, feature engineering, sampling and aggregation techniques, and practical performance gains.
Introduction – With the growth of data diversity, graph computing has become a key research direction. Graph neural networks (GNNs) enable representation learning on graph structures by aggregating neighbor features, making them essential for modern recommendation systems.
Graph Neural Network Overview – Traditional graph algorithms (e.g., PageRank, closeness) focus on topology, while modern methods like DeepWalk, node2vec, and especially GNNs consider both node and edge features. GNNs aggregate neighbor information together with the node’s own features to produce expressive embeddings.
Angel GNN Framework – Angel provides a suite of GNN algorithms (GraphSAGE, supervised/unsupervised, homogeneous/heterogeneous). The framework separates sampling (first‑order and second‑order neighbor sampling) from aggregation, storing first‑order features on Spark executors for fast access and second‑order adjacency on parameter servers.
GraphSAGE Principle – GraphSAGE relies on two core steps: (1) sampling a node’s first‑order neighbors randomly, then sampling their neighbors to obtain second‑order neighbors; (2) aggregating the second‑order features, merging them with first‑order features, and finally combining the result with the original node’s features to generate the final embedding.
Sampling and Aggregation Details – Sampling proceeds hierarchically: select a target node, randomly sample its immediate neighbors, then sample neighbors of those neighbors. Aggregation first combines second‑order neighbor features, merges them with first‑order features, and finally fuses the result with the node’s own feature to produce a rich embedding.
Recommendation Scenario 1: Tencent Video – User watch histories are used to build a graph where users and videos are nodes. Features include user attributes, video attributes, and sequential watch patterns processed by a Transformer. GraphSAGE is applied on a graph with tens of millions of nodes and billions of edges, yielding a ~3% lift in top‑50 recall and modest improvements in watch time.
Recommendation Scenario 2: WeChat Content – For public‑account recommendation, the initial user‑account follow graph contains super‑nodes (e.g., “People’s Daily”). The graph is transformed into an account‑to‑account graph, and noisy edges are filtered. Features from accounts and users are denoised, and DGI is chosen over GraphSAGE for better performance, resulting in notable gains in exposure, click‑through, and follow rates.
Experience Summary – Effective network construction and feature engineering are critical; noise reduction greatly impacts results. Algorithms must be tailored to specific scenarios, and multi‑model fusion (e.g., GNN+Transformer, GNN+XGBoost) often yields the best outcomes.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.