Artificial Intelligence 12 min read

Graph-based Weakly Supervised Framework for Semantic Relevance Learning in E-commerce

The paper introduces a graph‑based weakly supervised contrastive learning framework that uses heterogeneous user‑behavior graphs, e‑commerce‑specific augmentations, and a hybrid fine‑tuning/transfer learning strategy to improve semantic relevance matching between queries and product titles, achieving significant gains on a large‑scale Taobao dataset.

Alimama Tech

Nov 9, 2022

Graph-based Weakly Supervised Framework for Semantic Relevance Learning in E-commerce

Abstract: Product search is fundamental in e‑commerce. Relevance between user query and product title can be evaluated by semantic matching. Existing challenges include long‑tail distribution and lack of high‑quality semantic supervision. This work proposes a weakly supervised contrastive learning framework that leverages heterogeneous user‑behavior graphs to generate semantic‑aware data and designs e‑commerce‑specific data augmentation and training objectives. A hybrid post‑cross computation combines fine‑tuning and transfer learning to mitigate data distribution bias. Experiments on a large‑scale Taobao dataset show significant improvements.

Background: Text representation learning underpins many NLP tasks. In e‑commerce, queries are short and clear, while product titles are long and noisy, making direct matching difficult. Manual annotations are costly; user behavior provides weak signals but contains noise. Therefore, a graph‑based weak supervision approach is needed.

Method:

1. Graph‑based Data Construction: Build a bipartite user‑behavior graph with query and item nodes linked by clicks, purchases, etc. Use meta‑paths and pointwise mutual information as edge weights. Perform node2vec‑style random walks to sample positive and negative pairs with controllable semantic difficulty.

2. Contrastive Learning Framework: Based on MoCo, incorporate e‑commerce‑specific augmentations (Mix‑up, adversarial attacks) and a cluster‑level contrastive objective that treats groups of related queries as a semantic unit.

3. Cross‑Computation: Combine multi‑granularity fine‑tuning (using a small set of labeled pairs and high‑confidence behavior samples) with relation‑based transfer learning that matches a query with the set of neighboring queries of an item, reducing distribution shift.

Experiments: Comparative results on AUC, F1‑score, and KS show the proposed model outperforms baselines such as Sentence‑BERT, Click‑Trained Model, and MASM. Ablation studies confirm the contribution of graph‑derived weak supervision, hard negative sampling, and cluster‑level contrast.

Conclusion: The weakly supervised pre‑training framework effectively captures semantic relevance in e‑commerce search, alleviating long‑tail and annotation scarcity issues. The hybrid fine‑tuning and transfer learning strategy further bridges query‑item distribution gaps, leading to measurable gains in online relevance metrics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

e-commerce contrastive learning Information Retrieval Graph Neural Networks semantic relevance Weak Supervision

Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.