Machine Heart
Apr 26, 2026 · Artificial Intelligence
How MathForge Uses Hard Problems to Boost Large‑Model Mathematical Reasoning via Reinforcement Learning
MathForge tackles the overlooked issue of training large language models on mathematically challenging yet learnable problems by introducing a difficulty‑aware group policy optimization (DGPO) and multi‑aspect question reformulation (MQR), achieving consistent gains across model sizes and modalities.
DGPOLarge language modelsMQR
0 likes · 13 min read
