Baobao Algorithm Notes
Aug 17, 2025 · Artificial Intelligence
Boost 7B LLM Math Reasoning Beyond GPT‑4o with a Simple Pass@k Reward
By replacing the traditional Pass@1 reward with a Pass@k formulation and a lightweight advantage computation, a 7B language model can dramatically improve its performance on math reasoning benchmarks, surpassing GPT‑4o while adding only a few lines of code and minimal training overhead.
PythonRLHFreward engineering
0 likes · 7 min read
