Content Collaborative Graph Neural Network for Large‑Scale E‑commerce Search
CC‑GNN addresses three drawbacks of existing graph‑neural retrieval for e‑commerce by adding content phrase nodes, scalable meta‑path message passing, and difficulty‑aware noisy contrastive learning with counterfactual augmentation, achieving up to 16 % recall improvement and notably larger gains on long‑tail queries and cold‑start items.
Abstract: Existing graph neural network (GNN) based retrieval models for e‑commerce search achieve strong performance but suffer from three major drawbacks: they do not fully exploit product textual content, they are inefficient on industrial‑scale sparse graphs, and they perform poorly on long‑tail queries and cold‑start items. To address these issues, the paper proposes a Content Collaborative Graph Neural Network (CC‑GNN) that explicitly incorporates content phrase nodes into graph propagation, introduces a scalable MetaPath‑based message passing scheme, and employs difficulty‑aware noisy graph contrastive learning together with counterfactual data augmentation for both supervised and self‑supervised training.
Background: Vector‑based retrieval is widely used in product recall, yet GNN‑based encoders are limited by insufficient semantic modeling of product titles, high computational cost on massive graphs, and inadequate handling of sparse long‑tail interactions. Semantic drift of phrases further degrades relevance estimation.
Method: CC‑GNN constructs a content‑collaborative graph by extracting variable‑length phrase nodes from product titles. Candidate phrases are generated from frequent co‑occurring N‑grams in historical query‑item logs and pruned using a domain‑specific NER and scoring table. Two MetaPath types (click‑based and phrase‑based) guide separate sub‑graph samplings for queries and items, which are aggregated by parallel Graph Attention Networks (GAT). Supervised learning is enhanced with counterfactual positive samples for long‑tail queries and cold‑start items. For self‑supervised contrastive learning, a Difficulty‑Aware Representation Perturbation (DARP) adds noise proportional to node degree, and Counterfactual Data Supplement in Contrastive Learning (CDS‑CL) injects synthetic positive/negative pairs to bridge head‑tail gaps. The final loss combines supervised, contrastive, and counterfactual contrastive terms.
Experiments: CC‑GNN was evaluated on an industrial product‑query dataset (≈350 M queries, 870 M items, 7 B interactions). Compared with state‑of‑the‑art baselines (e.g., LasGNN), CC‑GNN improves overall recall metrics by up to 16 % and yields larger gains on long‑tail queries (+13.7 % Recall) and cold‑start items (+13.5 % NDCG). Ablation studies confirm the effectiveness of the content‑collaborative graph, DARP, and CDS‑CL. Additional tests on the Amazon Sports recommendation dataset show consistent performance boosts when CC‑GNN components are plugged into various base models (VBPR, MMGCN, SLMRec, etc.).
Conclusion: CC‑GNN provides an efficient, content‑aware graph learning framework that significantly enhances e‑commerce recall, mitigates semantic drift, and alleviates long‑tail and cold‑start challenges. Its modular components are applicable to a broad range of GNN‑based retrieval and recommendation systems, indicating strong potential for further performance improvements.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.