Tagged articles

Importance Sampling

9 articles · Page 1 of 1

Machine Learning Algorithms & Natural Language Processing

Mar 1, 2026 · Artificial Intelligence

From Traditional RL to LLM RL: Theory Derivation and Practical Engineering Improvements

This article walks through the fundamental derivation of policy‑based reinforcement learning, explains how traditional RL concepts extend to large‑language‑model RL, and details engineering enhancements such as GRPO memory reduction, asynchronous rollout, importance‑sampling corrections, and token‑flow management for stable industrial‑scale training.

GRPOImportance SamplingRLHF

0 likes · 11 min read

From Traditional RL to LLM RL: Theory Derivation and Practical Engineering Improvements

Machine Learning Algorithms & Natural Language Processing

Feb 24, 2026 · Artificial Intelligence

From Traditional RL to LLM‑RL: Theory Derivation and Engineering Improvements

The article walks through the fundamentals of traditional policy‑gradient reinforcement learning, derives the Reinforce objective, maps its concepts to large‑language‑model RL, and then discusses practical engineering solutions such as GRPO, async rollout, importance‑sampling corrections, and token‑flow management for industrial‑scale training.

Async RolloutGRPOImportance Sampling

0 likes · 10 min read

From Traditional RL to LLM‑RL: Theory Derivation and Engineering Improvements

AI Frontier Lectures

Dec 9, 2025 · Artificial Intelligence

Can Token‑Level Surrogates Stabilize RL for Large Language Models? A Deep Dive

This article analyzes why optimizing sequence‑level rewards for LLMs with token‑level surrogate objectives can improve reinforcement‑learning stability, explains the theoretical conditions required, introduces Routing Replay for MoE models, and presents extensive experiments validating the approach.

Importance SamplingLarge Language ModelsMixture of Experts

0 likes · 12 min read

Can Token‑Level Surrogates Stabilize RL for Large Language Models? A Deep Dive

Data Party THU

Sep 4, 2025 · Artificial Intelligence

Unraveling PPO Variants: From GRPO to DAPO and GSPO – A Deep Dive

This article provides a comprehensive technical analysis of PPO‑based reinforcement learning methods for large language models, detailing the evolution from the original PPO algorithm through GRPO, DAPO, and GSPO, and explaining their motivations, mathematical formulations, advantages, and practical challenges such as entropy collapse and importance‑sampling variance.

DAPOGRPOGSPO

0 likes · 30 min read

Unraveling PPO Variants: From GRPO to DAPO and GSPO – A Deep Dive

AI Algorithm Path

May 22, 2025 · Artificial Intelligence

Monte Carlo Policy Improvement in RL: Epsilon‑Greedy, On‑Policy vs Off‑Policy, and Incremental Updates

This tutorial explains how Monte Carlo methods are enhanced in reinforcement learning through epsilon‑greedy and epsilon‑soft policies, Monte Carlo control, a Blackjack Q‑function example, the distinction between on‑policy and off‑policy learning, importance sampling, and efficient incremental update techniques.

Epsilon-GreedyImportance SamplingMonte Carlo

0 likes · 14 min read

Monte Carlo Policy Improvement in RL: Epsilon‑Greedy, On‑Policy vs Off‑Policy, and Incremental Updates

Architecture Development Notes

Dec 1, 2024 · Fundamentals

How to Add Importance‑Sampling PDFs to a Rust Ray Tracer

This article walks through implementing probability‑density‑function (PDF) based importance sampling in a Rust ray‑tracing renderer, covering trait definitions, concrete PDF types for spheres, cosine distributions, hittable objects, quad geometry, material adjustments, and integration into the rendering loop to achieve faster convergence and higher image quality.

GraphicsImportance SamplingPDF

0 likes · 15 min read

How to Add Importance‑Sampling PDFs to a Rust Ray Tracer

Baobao Algorithm Notes

Oct 25, 2024 · Artificial Intelligence

How to Use Importance Sampling for Effective Continue Pretraining of LLMs

Continuing pretraining (CP) bridges pretraining and SFT to inject domain knowledge, but faces catastrophic forgetting; this article explores leveraging importance sampling to balance common and domain data, discusses data selection, annealing strategies, and practical tips for mitigating forgetting while enhancing specialized capabilities.

Catastrophic ForgettingContinue PretrainingDomain Adaptation

0 likes · 8 min read

How to Use Importance Sampling for Effective Continue Pretraining of LLMs

Alimama Tech

Apr 27, 2022 · Artificial Intelligence

DEFUSE and Bi-DEFUSE: Unbiased Delayed‑Feedback Modeling for CVR Prediction

The paper introduces DEFUSE and its multi‑task extension Bi‑DEFUSE, unbiased delayed‑feedback CVR models that correct label bias via rigorous importance‑sampling and a latent fake‑negative variable, achieving superior offline performance and a 2 % CVR lift in online deployment compared with existing industry baselines.

Bi-DEFUSECVRDEFUSE

0 likes · 25 min read

DEFUSE and Bi-DEFUSE: Unbiased Delayed‑Feedback Modeling for CVR Prediction

Hulu Beijing

Mar 8, 2018 · Artificial Intelligence

Master Common Sampling Techniques: Inverse Transform, Rejection, Importance & MCMC

This article explains the core ideas and step-by-step procedures of widely used sampling methods—including inverse transform, rejection, importance, and Markov Chain Monte Carlo techniques such as Metropolis‑Hastings and Gibbs—highlighting their mathematical foundations, practical implementations, and when each method is appropriate.

Importance SamplingMCMCMonte Carlo

0 likes · 11 min read

Master Common Sampling Techniques: Inverse Transform, Rejection, Importance & MCMC