Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 24, 2026 · Artificial Intelligence

What Advances Do GRPO, DAPO, GSPO, and SAPO Bring Over PPO?

After DPO, the typical research trajectory moves through GRPO, DAPO, GSPO, and SAPO, each introducing new optimization objectives, sampling strategies, and reward‑shaping techniques that aim to reduce memory usage, improve gradient stability, and enhance the efficiency of large‑model reinforcement learning.

DAPOGRPOGSPO
0 likes · 6 min read
What Advances Do GRPO, DAPO, GSPO, and SAPO Bring Over PPO?