Tagged articles
1 articles
Page 1 of 1
Machine Heart
Machine Heart
May 6, 2026 · Artificial Intelligence

Can Adaptive Guidance Unlock Small Model Reasoning? Introducing G²RPO‑A

The paper identifies reward sparsity as the core obstacle for small language models in reinforcement‑learning‑based reasoning, proposes G²RPO‑A which injects high‑quality thinking trajectories and dynamically adjusts guidance length, and demonstrates large accuracy gains on math and code benchmarks such as Qwen3‑1.7B improving from 50.96 % to 67.21 % on MATH500 and from 46.08 % to 75.93 % on HumanEval.

Code GenerationG²RPO‑Aadaptive guidance
0 likes · 10 min read
Can Adaptive Guidance Unlock Small Model Reasoning? Introducing G²RPO‑A