Artificial Intelligence 40 min read

Contrastive Learning Perspectives on Retrieval and Ranking Models in Recommendation Systems

This talk explains contrastive learning fundamentals, typical image‑domain models such as SimCLR, MoCo and SwAV, and shows how their principles—positive/negative sample construction, encoder design, loss functions, alignment and uniformity—can be applied to improve dual‑tower retrieval and ranking models, embedding normalization, temperature scaling, and graph‑based recommender systems.

DataFunSummit
DataFunSummit
DataFunSummit
Contrastive Learning Perspectives on Retrieval and Ranking Models in Recommendation Systems

The speaker introduces contrastive learning from a historical and technical viewpoint, tracing its roots to metric learning and self‑supervised methods like BERT, and describing its core pipeline: automatic positive‑sample generation, random negative sampling, encoder mapping, and the InfoNCE loss that enforces alignment of positives and uniformity of embeddings.

Key components of a contrastive learning system are identified as (1) how positives are constructed, (2) how the encoder maps inputs to a projection space, and (3) how the loss function is designed. Variations in these components lead to different models.

Typical image‑domain contrastive models are presented:

SimCLR uses in‑batch negatives, data augmentations for positives, a ResNet encoder plus a projector, L2‑norm on embeddings, and InfoNCE loss with a temperature hyper‑parameter.

MoCo introduces a momentum‑updated encoder and a large negative queue to overcome batch‑size limits.

SwAV combines contrastive learning with clustering, assigning each view to a prototype and optimizing similarity to the prototype.

The speaker then maps these ideas to recommendation systems, arguing that dual‑tower retrieval models are essentially contrastive learning systems. The dual‑tower architecture splits user and item features, encodes them separately, and computes similarity (inner product or cosine). Practical decisions include:

Negative sampling: in‑batch negatives, global random negatives, or a mix of both to mitigate selection bias.

Embedding normalization: applying L2‑norm (or using cosine similarity) improves training stability and linear separability.

Temperature scaling: a small temperature focuses the loss on hard negatives, similar to focal loss, and yields significant performance gains.

These three practices are explained through the lens of alignment and uniformity: in‑batch negatives increase uniformity, temperature emphasizes hard negatives, and normalization enforces consistent embedding magnitudes.

Beyond standard dual‑tower models, the speaker proposes extensions:

Adding contrastive auxiliary losses on the item side (e.g., dropout‑generated views) to strengthen long‑tail item embeddings.

Applying the same auxiliary loss on the user side, possibly using user behavior sequences or graph neural networks.

Integrating graph neural networks (GNNs) for graph‑based retrieval, where sub‑graph augmentations (node dropping, edge perturbation, feature masking, random walks) create contrastive views.

Finally, the talk suggests that contrastive learning can also be incorporated into ranking models, provided suitable positive‑sample constructions and loss functions are defined.

contrastive learningRecommendation systemsGraph Neural NetworksInfoNCEdual-tower modelsembedding normalizationtemperature scaling
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.