Enhanced Graph Embedding with Side Information (EGES) for User Growth and Cold‑Start Mitigation
This article presents EGES, a graph‑embedding model that incorporates side information to construct a directed user graph, apply biased random‑walk sampling, and train weighted Skip‑Gram embeddings, thereby improving large‑scale user acquisition and addressing cold‑start challenges in recommendation systems.
Background : User growth is critical for business, but traditional rule‑based or simple Look‑alike methods struggle with scalability and cold‑start users who lack interaction data.
Challenges : (1) Existing algorithms focus on precise but limited user groups, failing to expand reach; (2) New users provide no behavioral signals, making accurate prediction difficult.
Proposed Solution – EGES : EGES (Enhanced Graph Embedding with Side Information) builds a directed user graph from authorized spatio‑temporal data and side information (e.g., demographics). Random walks on this graph capture high‑order similarity, while side information weights are learned dynamically to enhance embedding quality.
User Graph Construction : Users are represented as nodes; sessions are defined by Geohash locations and time slots. Within each session, consecutive users are linked with edge weights equal to co‑occurrence counts (or functions of dwell time). All session graphs are merged to form a comprehensive directed graph.
Data Filtering : Noisy behaviors—such as rapid, geographically disparate actions or abnormal short‑duration sessions—are removed to prevent graph contamination.
Random‑Walk Sampling : Edge weights are normalized to form a transition matrix. A biased Node2vec walk (controllable DFS/BFS bias, walk length, and visit limits) generates user sequences, ensuring even low‑activity users are sampled via multiple edges.
Embedding Training : Sequences are fed into a weighted Skip‑Gram model. Positive pairs consist of target users and context users within a sliding window; negative samples are drawn proportionally to node indegree using TensorFlow’s sampled softmax. Side‑information embeddings are aggregated with learned importance weights, and new users receive embeddings via average‑ or weight‑pooled side‑information vectors.
Optimizations : 1) Directed vs. undirected graph choice depends on scenario (e.g., ad‑click paths favor directed edges). 2) Low‑weight edges are treated as graph noise and pruned. 3) To avoid OOM, transition probabilities are computed on‑the‑fly rather than pre‑stored.
Evaluation : Online A/B tests with 10k+ seed users showed that recalling a comparable‑size user set with EGES nearly doubled daily conversion rates, while larger recall sizes yielded diminishing returns, indicating suitability for expansion or promotional campaigns.
Conclusion & Future Work : Graph‑based embeddings outperform pure sequence models for sparse users, and random‑walk sampling captures unseen but plausible user sequences. Future improvements include handling multi‑value categorical side features, incorporating app‑level behavior data, and optimizing sampling memory consumption.
References : [1] Billion‑scale Commodity Embedding for E‑commerce Recommendation in Alibaba [2] https://zhuanlan.zhihu.com/p/133870768 [3] https://github.com/wangzhegeek/EGES
Tongcheng Travel Technology Center
Pursue excellence, start again with Tongcheng! More technical insights to help you along your journey and make development enjoyable.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.