Boost Model Robustness with 5 Lines of R‑Drop Contrastive Learning
This article introduces a simple five‑line implementation of R‑Drop, a contrastive self‑supervised learning technique that leverages dropout‑induced perturbations to improve model robustness, explains the underlying principle, provides the exact PyTorch code, and compares it with the ConSERT method.
Background
Contrastive learning, a popular self‑supervised approach, trains neural networks without extra labeled data by exploiting relationships between samples. The core idea is to use pairwise comparisons as supervision signals, encouraging the model to produce similar representations for perturbed versions of the same input.
Five‑Line R‑Drop Implementation
The following PyTorch snippet implements the essential R‑Drop loss in just five lines, adding a KL‑divergence term between two stochastic forward passes to the standard cross‑entropy loss.
# training context
ce = CrossEntropyLoss(reduction='none')
kld = nn.KLDivLoss(reduction='none')
logits1 = model(input)
logits2 = model(input)
# core R‑Drop loss
kl_weight = 0.5 # weight for contrastive loss
ce_loss = (ce(logits1, target) + ce(logits2, target)) / 2
kl_1 = kld(F.log_softmax(logits1, dim=-1), F.softmax(logits2, dim=-1)).sum(-1)
kl_2 = kld(F.log_softmax(logits2, dim=-1), F.softmax(logits1, dim=-1)).sum(-1)
loss = ce_loss + kl_weight * (kl_1 + kl_2) / 2Principle Explanation
During training, dropout is active, introducing randomness into each forward pass. A robust model should produce similar outputs for the same sample even when dropout masks differ. R‑Drop measures the divergence between the two output distributions (using KL‑divergence) and penalizes large differences, thereby encouraging stability against dropout‑induced noise.
The accompanying illustration (originally a GIF) visualizes how the model’s predictions remain consistent despite random dropout perturbations.
Comparison with ConSERT
R‑Drop is conceptually similar to the ConSERT framework, which also uses contrastive objectives for sentence representations. However, R‑Drop is simpler and cleaner, requiring only the five‑line loss addition, while achieving comparable or slightly better performance on the evaluated tasks. ConSERT explores additional perturbations beyond dropout, but the core idea remains the same.
References
Yan Y, Li R, Wang S, et al. ConSERT: A Contrastive Framework for Self‑Supervised Sentence Representation Transfer. arXiv preprint arXiv:2105.11741, 2021.
Liang X, Wu L, Li J, et al. R‑Drop: Regularized Dropout for Neural Networks. arXiv preprint arXiv:2106.14448, 2021.
Su Jianlin’s blog (https://spaces.ac.cn/archives/8496) – additional experiments and insights on Chinese datasets.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Baobao Algorithm Notes
Author of the BaiMian large model, offering technology and industry insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
