Artificial Intelligence 5 min read

Boost Model Robustness with 5 Lines of R‑Drop Contrastive Learning

This article introduces a simple five‑line implementation of R‑Drop, a contrastive self‑supervised learning technique that leverages dropout‑induced perturbations to improve model robustness, explains the underlying principle, provides the exact PyTorch code, and compares it with the ConSERT method.

Baobao Algorithm Notes

Dec 16, 2021

Boost Model Robustness with 5 Lines of R‑Drop Contrastive Learning

Background

Contrastive learning, a popular self‑supervised approach, trains neural networks without extra labeled data by exploiting relationships between samples. The core idea is to use pairwise comparisons as supervision signals, encouraging the model to produce similar representations for perturbed versions of the same input.

Five‑Line R‑Drop Implementation

The following PyTorch snippet implements the essential R‑Drop loss in just five lines, adding a KL‑divergence term between two stochastic forward passes to the standard cross‑entropy loss.

# training context
ce = CrossEntropyLoss(reduction='none')
kld = nn.KLDivLoss(reduction='none')
logits1 = model(input)
logits2 = model(input)
# core R‑Drop loss
kl_weight = 0.5  # weight for contrastive loss
ce_loss = (ce(logits1, target) + ce(logits2, target)) / 2
kl_1 = kld(F.log_softmax(logits1, dim=-1), F.softmax(logits2, dim=-1)).sum(-1)
kl_2 = kld(F.log_softmax(logits2, dim=-1), F.softmax(logits1, dim=-1)).sum(-1)
loss = ce_loss + kl_weight * (kl_1 + kl_2) / 2

Principle Explanation

During training, dropout is active, introducing randomness into each forward pass. A robust model should produce similar outputs for the same sample even when dropout masks differ. R‑Drop measures the divergence between the two output distributions (using KL‑divergence) and penalizes large differences, thereby encouraging stability against dropout‑induced noise.

The accompanying illustration (originally a GIF) visualizes how the model’s predictions remain consistent despite random dropout perturbations.

Comparison with ConSERT

R‑Drop is conceptually similar to the ConSERT framework, which also uses contrastive objectives for sentence representations. However, R‑Drop is simpler and cleaner, requiring only the five‑line loss addition, while achieving comparable or slightly better performance on the evaluated tasks. ConSERT explores additional perturbations beyond dropout, but the core idea remains the same.

References

Yan Y, Li R, Wang S, et al. ConSERT: A Contrastive Framework for Self‑Supervised Sentence Representation Transfer. arXiv preprint arXiv:2105.11741, 2021.

Liang X, Wu L, Li J, et al. R‑Drop: Regularized Dropout for Neural Networks. arXiv preprint arXiv:2106.14448, 2021.

Su Jianlin’s blog (https://spaces.ac.cn/archives/8496) – additional experiments and insights on Chinese datasets.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

contrastive learning PyTorch self-supervised Dropout

Written by

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.