Jun 24, 2026 · Artificial Intelligence

STAR‑PólyaMath Beats GPT‑5.5 by 13.5% on Apex Benchmark Across Eight Major Math Competitions

STAR‑PólyaMath, a multi‑agent reasoning system from T‑STAR Lab and Microsoft Research, introduces an exploration‑reasoning‑verification harness that outperforms GPT‑5.5 on the toughest MathArena Apex 2025 problems by 13.5% and achieves perfect scores on six other top math competition benchmarks.

GPT-5.5LLM verificationSTAR-PólyaMath

0 likes · 15 min read