Tagged articles

KL divergence

12 articles · Page 1 of 1

Machine Learning Algorithms & Natural Language Processing

Jun 16, 2026 · Artificial Intelligence

SFT, DAgger, Offline RL, and OPD: Four Methods Mapped onto a Single 2×2 Grid

The paper shows that SFT, DAgger, offline RL and OPD are the four orthogonal combinations of prefix source (teacher vs. student) and KL direction (forward vs. reverse), exposing three hidden trade‑offs—KL direction, prefix source, and training length—and proposes KL‑mixing and entropy‑gated length curricula that boost Avg@k by 3.6 points, raise Pass@k by up to 5.8 points, and cut response length by three‑fold.

DAggerKL divergenceLLM distillation

0 likes · 17 min read

SFT, DAgger, Offline RL, and OPD: Four Methods Mapped onto a Single 2×2 Grid

Machine Heart

May 29, 2026 · Artificial Intelligence

DiffusionOPD: A New Online Policy Distillation Paradigm for Multi‑Task Diffusion Models

DiffusionOPD introduces a unified on‑policy distillation framework for diffusion models that decouples single‑task online policy exploration from multi‑task capability integration, training expert teachers per task and distilling their skills into a single student model, achieving faster convergence and higher performance across composition, OCR, and aesthetic tasks.

Diffusion ModelsKL divergenceMulti-Task Learning

0 likes · 8 min read

DiffusionOPD: A New Online Policy Distillation Paradigm for Multi‑Task Diffusion Models

Machine Learning Algorithms & Natural Language Processing

Apr 12, 2026 · Artificial Intelligence

Deep Dive into Forward vs Reverse KL Divergence: When to Use Which?

The article explains the definitions, properties, and asymmetric nature of KL divergence, compares Forward KL (mean‑seeking) and Reverse KL (mode‑seeking) through bimodal examples, and provides practical guidelines for choosing between them based on sampling and probability‑evaluation capabilities in machine‑learning tasks.

Forward KLKL divergenceReverse KL

0 likes · 10 min read

Deep Dive into Forward vs Reverse KL Divergence: When to Use Which?

Machine Learning Algorithms & Natural Language Processing

Feb 22, 2026 · Artificial Intelligence

What Is On-Policy Distillation? A Deep Dive into On-Policy and Self-Distillation

The article explains On-Policy Distillation, derives its forward and reverse KL gradients, introduces Self‑Distillation where the policy serves as its own teacher, discusses practical implementation tricks such as extra‑knowledge injection, EMA or trust‑region teacher stabilization, and highlights benefits like reduced catastrophic forgetting, fewer Aha moments, and a narrower train‑test gap, especially for larger models.

Catastrophic ForgettingEMAKL divergence

0 likes · 6 min read

What Is On-Policy Distillation? A Deep Dive into On-Policy and Self-Distillation

Baobao Algorithm Notes

Nov 18, 2025 · Artificial Intelligence

How LightReasoner Lets Small Models Teach Large Models to Reason Efficiently

The LightReasoner paper from Hong Kong University shows that small language models can guide large models on critical reasoning steps, achieving up to 90% faster inference and significant accuracy gains across multiple math benchmarks.

Contrastive DecodingKL divergenceLarge Language Models

0 likes · 9 min read

How LightReasoner Lets Small Models Teach Large Models to Reason Efficiently

AI Algorithm Path

May 10, 2025 · Artificial Intelligence

Master KL Divergence: Definitions, Properties, and Real‑World Applications

This article explains the Kullback‑Leibler (KL) divergence for discrete and continuous distributions, outlines its non‑negativity and asymmetry, walks through a uniform‑distribution example, provides a simple Python demonstration, and discusses key applications in variational autoencoders, reinforcement‑learning policy optimization, and other machine‑learning contexts.

KL divergenceVariational Autoencoderinformation theory

0 likes · 7 min read

Master KL Divergence: Definitions, Properties, and Real‑World Applications

Code DAO

May 6, 2022 · Fundamentals

Information Theory Foundations for Machine Learning and Deep Learning

The article explains Shannon information content, entropy, cross‑entropy, KL‑divergence, conditional entropy and mutual information, illustrating each concept with coin‑flip and dice examples, visual formulas, and discusses their roles as loss functions and evaluation metrics in machine‑learning models.

KL divergencecross entropyentropy

0 likes · 8 min read

Information Theory Foundations for Machine Learning and Deep Learning

Code DAO

Dec 20, 2021 · Artificial Intelligence

Exploring Latent Space with a Variational Autoencoder in TensorFlow

This article explains the theory behind variational autoencoders, details their KL‑divergence loss, provides a complete TensorFlow implementation, and demonstrates reconstruction, latent‑space visualization, and novel image generation through sampling and interpolation.

KL divergencePythonTensorFlow

0 likes · 13 min read

Exploring Latent Space with a Variational Autoencoder in TensorFlow

Code DAO

Dec 10, 2021 · Artificial Intelligence

Understanding Variational Autoencoders: From Dimensionality Reduction to Generative Modeling

This article explains the principles of variational autoencoders, starting with dimensionality reduction techniques such as PCA and standard autoencoders, highlighting their limitations for data generation, and then detailing VAE's regularized latent space, variational inference, re‑parameterization, and loss formulation.

Deep LearningKL divergenceVAE

0 likes · 18 min read

Understanding Variational Autoencoders: From Dimensionality Reduction to Generative Modeling

21CTO

Feb 7, 2018 · Artificial Intelligence

Demystifying Entropy: From Basic Concepts to Cross‑Entropy and KL Divergence

This article explains entropy, joint entropy, conditional entropy, and related measures such as KL divergence and cross‑entropy, using intuitive coin‑flip examples and mathematical formulas to show how they quantify uncertainty and information in probability distributions.

KL divergencecross entropyentropy

0 likes · 14 min read

Demystifying Entropy: From Basic Concepts to Cross‑Entropy and KL Divergence

Architecture Digest

Feb 3, 2018 · Artificial Intelligence

Understanding Entropy, Joint Entropy, Conditional Entropy, Relative Entropy, and Cross Entropy

This article explains the concepts of entropy, joint entropy, conditional entropy, relative entropy (KL divergence) and cross‑entropy, illustrating their definitions, mathematical formulas, intuitive interpretations, and relationships through simple probability examples and visual diagrams.

KL divergencecross entropyentropy

0 likes · 14 min read

Understanding Entropy, Joint Entropy, Conditional Entropy, Relative Entropy, and Cross Entropy

Qunar Tech Salon

Mar 14, 2015 · Artificial Intelligence

Common Distance and Similarity Measures in Machine Learning and Data Mining

This article reviews the most frequently used distance and similarity formulas in machine learning and data mining, explaining their definitions, mathematical properties, practical examples, and when each metric is appropriate for measuring differences between data points or probability distributions.

Cosine SimilarityKL divergenceMahalanobis Distance

0 likes · 13 min read

Common Distance and Similarity Measures in Machine Learning and Data Mining