Graph Models in Information Feed Recommendation: Principles and Practice
This article introduces graph modeling concepts, explains how they are applied to large‑scale information‑feed recall, details specific algorithms such as DeepWalk, LINE and GraphSAGE, describes feature engineering, loss design, training, deployment, evaluation, and discusses current challenges and future directions.
The talk begins with a brief overview of graph models, highlighting that graphs consist of nodes and edges, can be directed or undirected, and may carry labels and attributes. By constructing a graph of users, items, and their interactions, various tasks such as node prediction, edge prediction, and sub‑graph prediction become possible for recommendation scenarios.
It then describes the information‑feed recall business, where millions of candidate articles must be filtered to a few thousand within tens of milliseconds. Traditional recall methods (ItemCF, UserCF, FM, DSSM, YouTubeDNN, DeepMatch) are mentioned, followed by the recent popularity of graph‑based embeddings.
The deployment pipeline is outlined: (1) Graph construction – defining user and item nodes, building bipartite or homogeneous edges based on interactions; (2) Feature engineering – user demographics, item categories, timestamps, etc.; (3) Edge and node weighting – assigning importance to clicks, likes, dwell time, and applying exponential scaling for smoother distributions.
Four graph algorithms are presented:
DeepWalk – random walks on homogeneous item‑item graphs, training node vectors with a Word2Vec‑style objective.
LINE – fast training of first‑order and second‑order similarity on bipartite user‑item graphs, using negative sampling and sampled softmax.
GraphSAGE – multi‑hop neighbor aggregation (mean, sum, or attention) to generate expressive user and item embeddings, suitable for dynamic graphs.
Loss functions – Cross‑Entropy as a baseline, weighted Cross‑Entropy, pairwise BPR loss, and margin‑based Triplet loss, with strategies for weighting positive samples and applying time decay.
Training tricks include positive‑sample weighting, time‑based decay, and handling the massive negative‑sample space by type‑aware sampling and hard‑negative mining (Top 101‑500). The model is trained on a distributed TensorFlow platform with separate GraphServer, GraphClient, Parameter Server, and Worker roles.
Evaluation comprises qualitative checks, vector visualization (PCA/TSNE), heat‑map correlation analysis, and quantitative metrics such as Recall, AUC, NDCG, MRR, MAP, diversity, relevance, and timeliness, followed by online A/B testing.
Finally, the summary reflects on challenges like user/item cold‑start, bias, frequency imbalance, and large‑sample learning, noting that graph models can alleviate some issues (e.g., debiasing via sampling) but still rely on feature‑based inference for completely isolated nodes.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.