preference data — 1 Technical Articles

Sep 23, 2024 · Artificial Intelligence

Why Post‑Training Is Redefining LLMs: DPO vs PPO, Synthetic Data, and Scaling Strategies

This article analyzes recent post‑training trends in large language models, comparing DPO and PPO, examining the scarcity of open‑source preference data, the iterative training process, the rise of synthetic data pipelines, and emerging methods for improving math and reasoning capabilities.

DPOLLMPPO

0 likes · 12 min read

Why Post‑Training Is Redefining LLMs: DPO vs PPO, Synthetic Data, and Scaling Strategies