Tagged articles
2 articles
Page 1 of 1
DataFunSummit
DataFunSummit
Mar 30, 2025 · Artificial Intelligence

RLHF Techniques and Challenges in Large Language Models and Multimodal Applications

This article reviews reinforcement learning, RLHF, and related alignment techniques for large language models and multimodal systems, covering fundamentals, recent advances such as InstructGPT, Constitutional AI, RLAIF, Super Alignment, GPT‑4o, video LLMs, and experimental evaluations of proposed methods.

RLHFmultimodal alignmentpreference learning
0 likes · 26 min read
RLHF Techniques and Challenges in Large Language Models and Multimodal Applications
Baobao Algorithm Notes
Baobao Algorithm Notes
Jul 9, 2024 · Artificial Intelligence

Why Step-Level DPO Is Revolutionizing LLM Math Reasoning

This article reviews recent step‑level DPO research, compares it with instance‑level DPO, explains the underlying Monte Carlo Tree Search formulation, and presents the author’s own replication experiments that demonstrate consistent performance gains across multiple LLM sizes on GSM8K and MATH benchmarks.

AI researchLLM alignmentMCTS
0 likes · 10 min read
Why Step-Level DPO Is Revolutionizing LLM Math Reasoning