Contrastive Learning: Definitions, Principles, Classic Algorithms, and Applications in Recommendation Systems
This article introduces contrastive learning, explains its definition, principles, and classic algorithms such as SimCLR and MoCo, and details its practical applications in recommendation systems, including a case study of its deployment at Zhuanzhuan that boosted order rates by over 10%.
1 What is Contrastive Learning
1.1 Definition
Contrastive Learning (CL) is a hot research direction in AI, a form of self‑supervised learning that has been highlighted at major conferences (ICLR 2020, NIPS, ACL, KDD, CIKM) and adopted by companies such as Google, Facebook, DeepMind, Alibaba, Tencent, and ByteDance. It has driven state‑of‑the‑art results in computer vision and natural language processing.
1.2 Principle
CL originates from metric learning: given positive and negative sample pairs and an encoder that maps data to a representation space, the objective pushes positives closer and negatives farther apart. Unlike supervised vector‑based retrieval, CL does not require manually labeled data; it relies on data augmentation to generate positives, while negatives are sampled from the dataset.
The core components are (1) constructing positive/negative pairs via augmentation, (2) designing an encoder that preserves information without collapse, and (3) a contrastive loss such as NCE loss.
1.3 Classic Algorithms
Key algorithms include SimCLR, which introduced extensive data‑augmentation combinations and a projection head; SimCLR‑v2, MoCo and MoCo‑v2, which add a memory bank and momentum updates to increase negative samples and stabilize training.
2 Applications of Contrastive Learning
Beyond academia, CL is applied in recommendation systems. Google’s SSL uses random and correlated feature masking to learn item embeddings for cold‑start items. Alibaba‑Seq2seq applies CL to sequential recommendation by augmenting user behavior sequences.
Graph Contrastive Learning (GCL) extends CL to graph data by perturbing edges or nodes while preserving important structures.
3 Practice at Zhuanzhuan
Zhuanzhuan (a second‑hand marketplace) adopts text‑based CL for item retrieval. Item texts are encoded with word2vec and pooled; an auto‑encoder provides augmented representations for positives; negatives are sampled randomly within a batch based on low similarity derived from user click behavior.
The encoder is a twin‑tower network with three fully‑connected layers shared between towers, trained with a binary cross‑entropy loss that predicts whether a pair is similar.
Deployed in the recall stage, this CL‑enhanced model has increased order‑per‑impression rates by over 10%.
Future work includes extending the learned item vectors to ranking and other downstream tasks, and exploring pre‑training of user embeddings.
Author: Li Guangming, senior algorithm engineer (WeChat: gmlldgm ).
Zhuanzhuan Tech
A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.