Rejection Sampling — 3 Technical Articles

Oct 22, 2024 · Artificial Intelligence

Uncovering Hidden Assumptions in RLHF: Theory, DPO & PPO Pitfalls

This article analytically explores the implicit assumptions behind the RLHF optimization objective, examines how they limit DPO and PPO methods, and proposes practical improvements such as rejection sampling and online on‑policy data selection to narrow the gap between theory and practice.

AI alignmentDPOPPO

0 likes · 22 min read

Uncovering Hidden Assumptions in RLHF: Theory, DPO & PPO Pitfalls

NewBeeNLP

Apr 1, 2024 · Artificial Intelligence

How Llama 2 Uses RLHF, PPO, Rejection Sampling, and Ghost Attention

This article provides a detailed technical walkthrough of Llama 2's Reinforcement Learning with Human Feedback pipeline, covering human preference data collection, reward‑model design and training, iterative fine‑tuning with PPO and rejection sampling, the Ghost Attention technique for multi‑turn consistency, and the resulting experimental evaluations.

Ghost AttentionLlama-2PPO

0 likes · 18 min read

How Llama 2 Uses RLHF, PPO, Rejection Sampling, and Ghost Attention

Hulu Beijing

Mar 8, 2018 · Artificial Intelligence

Master Common Sampling Techniques: Inverse Transform, Rejection, Importance & MCMC

This article explains the core ideas and step-by-step procedures of widely used sampling methods—including inverse transform, rejection, importance, and Markov Chain Monte Carlo techniques such as Metropolis‑Hastings and Gibbs—highlighting their mathematical foundations, practical implementations, and when each method is appropriate.

Importance SamplingMCMCMonte Carlo

0 likes · 11 min read

Master Common Sampling Techniques: Inverse Transform, Rejection, Importance & MCMC

Uncovering Hidden Assumptions in RLHF: Theory, DPO & PPO Pitfalls

How Llama 2 Uses RLHF, PPO, Rejection Sampling, and Ghost Attention

Master Common Sampling Techniques: Inverse Transform, Rejection, Importance & MCMC

How Llama 2 Uses RLHF, PPO, Rejection Sampling, and Ghost Attention