How Graph Embedding Boosts Cross-Category Bundle Recommendations on E‑Commerce
This article explains how graph embedding techniques, including a BSP‑based distributed LINE implementation and a cross‑category probabilistic graph model, are applied to improve the diversity and relevance of bundle (凑单) recommendations during large‑scale shopping events.
Background
In this year's bundle (凑单) scenario, the second‑page recommendation added a "锦囊" to increase product richness and personalized category tags, while supporting personalized recommendations for the Tmall "万券齐发" venue and mixed category subsidies. The focus remains on enhancing user exploration and shopping experience by improving recommendation diversity and cross‑category relevance. The article mainly discusses the Graph Embedding work, including parallel algorithm attempts and applications.
Algorithm
Problem Abstraction and Description
The basic user‑item purchase relationship on an e‑commerce platform can be modeled as a bipartite graph of users and items. Solid blue edges represent direct interactions (click, purchase), while dashed black edges capture item‑item relationships derived from shared user behaviors. If node attributes are considered, the bipartite graph becomes a more complex attributed graph.
To compute item‑to‑item (I2I) relationships, a common approach converts the bipartite graph into a homogeneous item graph using memory‑based collaborative filtering (e.g., Adamic‑Adar, Swing) or samples weighted random walks to generate co‑occurrence samples, then trains item embeddings with Skip‑Gram models (DeepWalk, Node2Vec, LINE). These embeddings are used for link prediction and classification.
In the bundle scenario, recommending items that the user has already added to the cart can be counter‑productive; therefore, the recommendation emphasizes cross‑category diversity. Building on last year's Graph Embedding deployment, this year we strengthened cross‑category training and attempted a BSP‑based distributed LINE implementation, designing a cross‑category probabilistic graph model.
Distributed LINE Implementation on BSP
SGNS (Skip‑Grams with Negative Sampling) is the classic Word2Vec model widely adopted in Graph Embedding. LINE combines first‑order and second‑order proximity to learn node vectors. The objective functions O1 (first‑order) and O2 (second‑order) are optimized with negative sampling:
In a BSP framework, the neighbor‑wise updates (first part) parallelize well, while global negative sampling (second part) is harder. Prior work introduced Target Negative Sampling to parallelize negative sampling across partitions.
Using the Odps‑Graph BSP framework, we implemented a distributed LINE algorithm where vertices store node vectors, and negative sampling is confined to each worker, with inter‑worker messages providing approximate global sampling. Gradient updates for positive and negative samples are performed in two separate super‑steps via vertex messaging. The pseudo‑code is illustrated below:
Cross‑Category Probabilistic Graph Model
Traditional Graph Embedding models treat any two nodes equally when computing similarity. For bundle recommendations, we need to emphasize cross‑category learning. Inspired by the RARE algorithm, we extend it by weakening similarity between items of different categories based on category distance, and we embed category vectors themselves.
The probabilistic graph model incorporates item embeddings, category embeddings, and cross‑category embeddings, trained via a MAP objective. The model captures that when a user interacts with two items, part of the similarity may stem from shared category attributes rather than pure item embeddings.
Practical Cases
From a modern decorative painting, the system recalls oil paintings, switch stickers, tableware, water kettles, wall stickers, etc., across categories.
From a trench coat, the system recalls facial cream, mascara, BB cream, earrings, dresses, and other cross‑category items.
Summary
Graph Embedding is a crucial branch of graph learning that represents nodes with vectors, enabling the capture of high‑order relationships beyond first and second order. It improves recommendation richness and novelty. Ongoing research in the company’s algorithm and system teams continues to deepen these techniques.
Outlook
Future work will incorporate more attribute features (product, user) into the graph, explore meta‑path based embeddings, and integrate embedding‑based I2I retrieval with ranking models. Improving the completeness of bundle entry points and reducing repetitive exposure will also be key research directions.
Project Summary
This year the bundle project upgraded both system and algorithm components, deploying deep learning models, group‑knapsack optimization, cross‑category graph models, and real‑time LTR for weight learning, resulting in notable increases in payment amount and conversion rates.
References
【1】Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25 (3), 211‑230.
【2】Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., & Mei, Q. (2015). LINE: Large‑scale Information Network Embedding. In WWW .
【3】Gu, Y., Sun, Y., Li, Y., & Yang, Y. (2018). RaRE: Social Rank Regulated Large‑scale Network Embedding. WWW , 2018.
【4】Perozzi, B., Al‑Rfou, R., & Skiena, S. (2014). DeepWalk: Online learning of social representations. KDD , 2014.
【5】Grover, A., & Leskovec, J. (2016). node2vec: Scalable feature learning for networks. KDD , 2016.
【6】Stergiou, S., Straznickas, Z., Wu, R., & Tsioutsiouliklis, K. (2017). Distributed Negative Sampling for Word Embeddings. AAAI .
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
