Artificial Intelligence 19 min read

Content Collaborative Graph Neural Network for Large‑Scale E‑commerce Search

CC‑GNN addresses three drawbacks of existing graph‑neural retrieval for e‑commerce by adding content phrase nodes, scalable meta‑path message passing, and difficulty‑aware noisy contrastive learning with counterfactual augmentation, achieving up to 16 % recall improvement and notably larger gains on long‑tail queries and cold‑start items.

Alimama Tech

Sep 12, 2023

Content Collaborative Graph Neural Network for Large‑Scale E‑commerce Search

Abstract: Existing graph neural network (GNN) based retrieval models for e‑commerce search achieve strong performance but suffer from three major drawbacks: they do not fully exploit product textual content, they are inefficient on industrial‑scale sparse graphs, and they perform poorly on long‑tail queries and cold‑start items. To address these issues, the paper proposes a Content Collaborative Graph Neural Network (CC‑GNN) that explicitly incorporates content phrase nodes into graph propagation, introduces a scalable MetaPath‑based message passing scheme, and employs difficulty‑aware noisy graph contrastive learning together with counterfactual data augmentation for both supervised and self‑supervised training.

Background: Vector‑based retrieval is widely used in product recall, yet GNN‑based encoders are limited by insufficient semantic modeling of product titles, high computational cost on massive graphs, and inadequate handling of sparse long‑tail interactions. Semantic drift of phrases further degrades relevance estimation.

Method: CC‑GNN constructs a content‑collaborative graph by extracting variable‑length phrase nodes from product titles. Candidate phrases are generated from frequent co‑occurring N‑grams in historical query‑item logs and pruned using a domain‑specific NER and scoring table. Two MetaPath types (click‑based and phrase‑based) guide separate sub‑graph samplings for queries and items, which are aggregated by parallel Graph Attention Networks (GAT). Supervised learning is enhanced with counterfactual positive samples for long‑tail queries and cold‑start items. For self‑supervised contrastive learning, a Difficulty‑Aware Representation Perturbation (DARP) adds noise proportional to node degree, and Counterfactual Data Supplement in Contrastive Learning (CDS‑CL) injects synthetic positive/negative pairs to bridge head‑tail gaps. The final loss combines supervised, contrastive, and counterfactual contrastive terms.

Experiments: CC‑GNN was evaluated on an industrial product‑query dataset (≈350 M queries, 870 M items, 7 B interactions). Compared with state‑of‑the‑art baselines (e.g., LasGNN), CC‑GNN improves overall recall metrics by up to 16 % and yields larger gains on long‑tail queries (+13.7 % Recall) and cold‑start items (+13.5 % NDCG). Ablation studies confirm the effectiveness of the content‑collaborative graph, DARP, and CDS‑CL. Additional tests on the Amazon Sports recommendation dataset show consistent performance boosts when CC‑GNN components are plugged into various base models (VBPR, MMGCN, SLMRec, etc.).

Conclusion: CC‑GNN provides an efficient, content‑aware graph learning framework that significantly enhances e‑commerce recall, mitigates semantic drift, and alleviates long‑tail and cold‑start challenges. Its modular components are applicable to a broad range of GNN‑based retrieval and recommendation systems, indicating strong potential for further performance improvements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

contrastive learning self-supervised learning cold-start Graph Neural Networks content collaboration E-commerce Search Long Tail

Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.