Tagged articles
9 articles
Page 1 of 1
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 1, 2026 · Artificial Intelligence

From Traditional RL to LLM RL: Theory Derivation and Practical Engineering Improvements

This article walks through the fundamental derivation of policy‑based reinforcement learning, explains how traditional RL concepts extend to large‑language‑model RL, and details engineering enhancements such as GRPO memory reduction, asynchronous rollout, importance‑sampling corrections, and token‑flow management for stable industrial‑scale training.

Asynchronous RolloutGRPOImportance Sampling
0 likes · 11 min read
From Traditional RL to LLM RL: Theory Derivation and Practical Engineering Improvements
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Feb 24, 2026 · Artificial Intelligence

From Traditional RL to LLM‑RL: Theory Derivation and Engineering Improvements

The article walks through the fundamentals of traditional policy‑gradient reinforcement learning, derives the Reinforce objective, maps its concepts to large‑language‑model RL, and then discusses practical engineering solutions such as GRPO, async rollout, importance‑sampling corrections, and token‑flow management for industrial‑scale training.

Async RolloutGRPOImportance Sampling
0 likes · 10 min read
From Traditional RL to LLM‑RL: Theory Derivation and Engineering Improvements
AI Frontier Lectures
AI Frontier Lectures
Dec 9, 2025 · Artificial Intelligence

Can Token‑Level Surrogates Stabilize RL for Large Language Models? A Deep Dive

This article analyzes why optimizing sequence‑level rewards for LLMs with token‑level surrogate objectives can improve reinforcement‑learning stability, explains the theoretical conditions required, introduces Routing Replay for MoE models, and presents extensive experiments validating the approach.

Importance SamplingMixture of Expertslarge language models
0 likes · 12 min read
Can Token‑Level Surrogates Stabilize RL for Large Language Models? A Deep Dive
Data Party THU
Data Party THU
Sep 4, 2025 · Artificial Intelligence

Unraveling PPO Variants: From GRPO to DAPO and GSPO – A Deep Dive

This article provides a comprehensive technical analysis of PPO‑based reinforcement learning methods for large language models, detailing the evolution from the original PPO algorithm through GRPO, DAPO, and GSPO, and explaining their motivations, mathematical formulations, advantages, and practical challenges such as entropy collapse and importance‑sampling variance.

DAPOGRPOGSPO
0 likes · 30 min read
Unraveling PPO Variants: From GRPO to DAPO and GSPO – A Deep Dive
AI Algorithm Path
AI Algorithm Path
May 22, 2025 · Artificial Intelligence

Monte Carlo Policy Improvement in RL: Epsilon‑Greedy, On‑Policy vs Off‑Policy, and Incremental Updates

This tutorial explains how Monte Carlo methods are enhanced in reinforcement learning through epsilon‑greedy and epsilon‑soft policies, Monte Carlo control, a Blackjack Q‑function example, the distinction between on‑policy and off‑policy learning, importance sampling, and efficient incremental update techniques.

Epsilon-GreedyImportance SamplingMonte Carlo
0 likes · 14 min read
Monte Carlo Policy Improvement in RL: Epsilon‑Greedy, On‑Policy vs Off‑Policy, and Incremental Updates
Architecture Development Notes
Architecture Development Notes
Dec 1, 2024 · Fundamentals

How to Add Importance‑Sampling PDFs to a Rust Ray Tracer

This article walks through implementing probability‑density‑function (PDF) based importance sampling in a Rust ray‑tracing renderer, covering trait definitions, concrete PDF types for spheres, cosine distributions, hittable objects, quad geometry, material adjustments, and integration into the rendering loop to achieve faster convergence and higher image quality.

GraphicsImportance SamplingPDF
0 likes · 15 min read
How to Add Importance‑Sampling PDFs to a Rust Ray Tracer
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 25, 2024 · Artificial Intelligence

How to Use Importance Sampling for Effective Continue Pretraining of LLMs

Continuing pretraining (CP) bridges pretraining and SFT to inject domain knowledge, but faces catastrophic forgetting; this article explores leveraging importance sampling to balance common and domain data, discusses data selection, annealing strategies, and practical tips for mitigating forgetting while enhancing specialized capabilities.

Catastrophic ForgettingContinue PretrainingImportance Sampling
0 likes · 8 min read
How to Use Importance Sampling for Effective Continue Pretraining of LLMs
Alimama Tech
Alimama Tech
Apr 27, 2022 · Artificial Intelligence

DEFUSE and Bi-DEFUSE: Unbiased Delayed‑Feedback Modeling for CVR Prediction

The paper introduces DEFUSE and its multi‑task extension Bi‑DEFUSE, unbiased delayed‑feedback CVR models that correct label bias via rigorous importance‑sampling and a latent fake‑negative variable, achieving superior offline performance and a 2 % CVR lift in online deployment compared with existing industry baselines.

Bi-DEFUSECVRDEFUSE
0 likes · 25 min read
DEFUSE and Bi-DEFUSE: Unbiased Delayed‑Feedback Modeling for CVR Prediction
Hulu Beijing
Hulu Beijing
Mar 8, 2018 · Artificial Intelligence

Master Common Sampling Techniques: Inverse Transform, Rejection, Importance & MCMC

This article explains the core ideas and step-by-step procedures of widely used sampling methods—including inverse transform, rejection, importance, and Markov Chain Monte Carlo techniques such as Metropolis‑Hastings and Gibbs—highlighting their mathematical foundations, practical implementations, and when each method is appropriate.

Importance SamplingMCMCMonte Carlo
0 likes · 11 min read
Master Common Sampling Techniques: Inverse Transform, Rejection, Importance & MCMC