Why Data Augmentation Triggers OOD Fluctuations and How PEER Solves It
Data augmentation, while popular for single-source domain generalization, often induces severe out-of-distribution performance swings during training; the PEER framework combats this by employing dual-model collaboration, entropy regularization, periodic parameter averaging, and dynamic augmentation, achieving state-of-the-art robustness across multiple benchmark datasets.
Problem: Mid‑training OOD fluctuation
Data augmentation in single‑source domain generalization causes severe out‑of‑distribution (OOD) performance volatility during training, especially in the middle of training. The fluctuation is caused by feature distortion: learning from diverse augmented samples interferes with previously learned features. The effect worsens with more complex augmentations, larger source‑target gaps, and when augmented samples differ more from the original data than the natural domain gap.
PEER framework
Dual‑model interaction
Task model (F) : frozen, accumulates stable knowledge and guides learning.
Proxy model (P) : trainable, learns from augmented data under guidance of F.
Key components
Mutual‑information regularization : a shared projection head maximizes mutual information between features of F (original samples) and P (augmented samples), aligning their representations.
Periodic parameter averaging : every k epochs, F updates its parameters by averaging the historical parameters of P, aggregating knowledge from diverse augmentations.
Dynamic augmentation : the augmentation function G is updated synchronously with P, providing richer distributional changes while averaging maintains stability.
Experimental results
Digits : +7.08% absolute accuracy over RandAug.
PACS : +2.30% over comparable methods.
Office‑Home : +10.62%.
VLCS : +6.66%.
Fluctuation suppression
PACS variance reduced from 1.82 to 0.63 (65.3% reduction).
Digits variance reduced from 3.27 to 0.91 (72.2% reduction).
Analysis
Centered Kernel Alignment (CKA) shows that PEER aligns task and proxy representations, preserving knowledge across training stages.
Parameter‑space connectivity experiments demonstrate smoother parameter trajectories for the proxy model; interpolations between checkpoints improve performance, confirming the benefit of periodic averaging.
Conclusion
PEER mitigates mid‑training OOD fluctuations in single‑source domain generalization by combining a frozen task model with a trainable proxy model, mutual‑information regularization, and periodic parameter averaging. The method achieves state‑of‑the‑art performance on multiple benchmarks while stabilizing OOD accuracy.
Code example
收
藏
,
分
享
、
在
看
,
给
个
三
连
击呗!Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
