Artificial Intelligence 12 min read

Graph-based Weakly Supervised Framework for Semantic Relevance Learning in E-commerce

The paper introduces a graph‑based weakly supervised contrastive learning framework that uses heterogeneous user‑behavior graphs, e‑commerce‑specific augmentations, and a hybrid fine‑tuning/transfer learning strategy to improve semantic relevance matching between queries and product titles, achieving significant gains on a large‑scale Taobao dataset.

Alimama Tech
Alimama Tech
Alimama Tech
Graph-based Weakly Supervised Framework for Semantic Relevance Learning in E-commerce

Abstract: Product search is fundamental in e‑commerce. Relevance between user query and product title can be evaluated by semantic matching. Existing challenges include long‑tail distribution and lack of high‑quality semantic supervision. This work proposes a weakly supervised contrastive learning framework that leverages heterogeneous user‑behavior graphs to generate semantic‑aware data and designs e‑commerce‑specific data augmentation and training objectives. A hybrid post‑cross computation combines fine‑tuning and transfer learning to mitigate data distribution bias. Experiments on a large‑scale Taobao dataset show significant improvements.

Background: Text representation learning underpins many NLP tasks. In e‑commerce, queries are short and clear, while product titles are long and noisy, making direct matching difficult. Manual annotations are costly; user behavior provides weak signals but contains noise. Therefore, a graph‑based weak supervision approach is needed.

Method:

1. Graph‑based Data Construction: Build a bipartite user‑behavior graph with query and item nodes linked by clicks, purchases, etc. Use meta‑paths and pointwise mutual information as edge weights. Perform node2vec‑style random walks to sample positive and negative pairs with controllable semantic difficulty.

2. Contrastive Learning Framework: Based on MoCo, incorporate e‑commerce‑specific augmentations (Mix‑up, adversarial attacks) and a cluster‑level contrastive objective that treats groups of related queries as a semantic unit.

3. Cross‑Computation: Combine multi‑granularity fine‑tuning (using a small set of labeled pairs and high‑confidence behavior samples) with relation‑based transfer learning that matches a query with the set of neighboring queries of an item, reducing distribution shift.

Experiments: Comparative results on AUC, F1‑score, and KS show the proposed model outperforms baselines such as Sentence‑BERT, Click‑Trained Model, and MASM. Ablation studies confirm the contribution of graph‑derived weak supervision, hard negative sampling, and cluster‑level contrast.

Conclusion: The weakly supervised pre‑training framework effectively captures semantic relevance in e‑commerce search, alleviating long‑tail and annotation scarcity issues. The hybrid fine‑tuning and transfer learning strategy further bridges query‑item distribution gaps, leading to measurable gains in online relevance metrics.

contrastive learninginformation retrievale-commercegraph neural networkssemantic relevanceWeak Supervision
Alimama Tech
Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.