Boost 7B LLM Math Reasoning Beyond GPT‑4o with a Simple Pass@k Reward

By replacing the traditional Pass@1 reward with a Pass@k formulation and a lightweight advantage computation, a 7B language model can dramatically improve its performance on math reasoning benchmarks, surpassing GPT‑4o while adding only a few lines of code and minimal training overhead.

PythonRLHFreward engineering

0 likes · 7 min read

Boost 7B LLM Math Reasoning Beyond GPT‑4o with a Simple Pass@k Reward