MathForge: Leveraging Hard Problems in RL to Boost Large‑Model Mathematical Reasoning (ICLR 2026)
MathForge tackles the long‑standing question of which math problems deserve focus in reinforcement‑learning‑based training, introducing a difficulty‑aware optimizer (DGPO) and multi‑aspect question reformulation (MQR) that together prioritize harder‑but‑learnable questions, yielding consistent performance gains across model sizes and modalities.
