OpenRLHF — 3 Technical Articles

May 20, 2025 · Artificial Intelligence

Boosting RLHF Training Efficiency with Asynchronous vLLM and Ray Integration

This article explains how an asynchronous RLHF pipeline built on vLLM, Ray, and OpenRLHF dramatically reduces training bottlenecks by decoupling inference, environment interaction, and model updates, and provides detailed implementation code and design choices for scalable reinforcement learning.

OpenRLHFRLHFRay

0 likes · 11 min read

Boosting RLHF Training Efficiency with Asynchronous vLLM and Ray Integration

Baobao Algorithm Notes

Nov 19, 2024 · Artificial Intelligence

Demystifying OpenRLHF Loss Functions: From GPTLM to KTO and Beyond

This article walks through the various loss functions used in OpenRLHF—including GPTLMLoss, KDLoss, DPOLoss, KTOLoss, and reward model losses—explaining their mathematical foundations, implementation details, and practical considerations for RLHF training.

DPOKTOLoss Functions

0 likes · 23 min read

Demystifying OpenRLHF Loss Functions: From GPTLM to KTO and Beyond

Baobao Algorithm Notes

Oct 21, 2024 · Artificial Intelligence

Unraveling RLHF: From PPO to DPO and Beyond – A Comprehensive Guide

This article provides a thorough, four‑part overview of RLHF for large language models, covering preference‑optimization algorithms (PPO‑based and offline RL approaches), reward‑model training techniques, inference‑time exploration strategies, and practical implementation details including the OpenRLHF framework and resource‑allocation tricks.

DPOLLM optimizationOpenRLHF

0 likes · 27 min read

Unraveling RLHF: From PPO to DPO and Beyond – A Comprehensive Guide