EMA — 3 Technical Articles

Machine Learning Algorithms & Natural Language Processing

Feb 22, 2026 · Artificial Intelligence

What Is On-Policy Distillation? A Deep Dive into On-Policy and Self-Distillation

The article explains On-Policy Distillation, derives its forward and reverse KL gradients, introduces Self‑Distillation where the policy serves as its own teacher, discusses practical implementation tricks such as extra‑knowledge injection, EMA or trust‑region teacher stabilization, and highlights benefits like reduced catastrophic forgetting, fewer Aha moments, and a narrower train‑test gap, especially for larger models.

Catastrophic ForgettingEMAKL Divergence

0 likes · 6 min read

What Is On-Policy Distillation? A Deep Dive into On-Policy and Self-Distillation

Baobao Algorithm Notes

Jul 26, 2022 · Artificial Intelligence

Boost Model Accuracy with 6 Proven Training Tricks

This article compiles six practical machine‑learning tricks—including adversarial training (FGM), EMA/SWA, R‑Drop contrastive loss, test‑time augmentation, pseudo‑labeling, and missing‑value imputation—explaining their principles, providing ready‑to‑use code snippets, and discussing their benefits and trade‑offs for stable and faster model training.

AIEMAR-Drop

0 likes · 10 min read

Boost Model Accuracy with 6 Proven Training Tricks

Baobao Algorithm Notes

Dec 14, 2021 · Artificial Intelligence

Five Quick Tricks to Supercharge Your Neural Network Training

This article presents five concise, widely applicable techniques—adversarial training with FGM, exponential moving average (EMA), test‑time augmentation (TTA), pseudo‑label learning, and special‑sample handling via nearest‑neighbor retrieval—to reliably improve model performance with minimal code changes.

AIEMATTA

0 likes · 7 min read

Five Quick Tricks to Supercharge Your Neural Network Training