Machine Heart
Jun 24, 2026 · Artificial Intelligence
STAR‑PólyaMath Beats GPT‑5.5 by 13.5% on Apex Benchmark Across Eight Major Math Competitions
STAR‑PólyaMath, a multi‑agent reasoning system from T‑STAR Lab and Microsoft Research, introduces an exploration‑reasoning‑verification harness that outperforms GPT‑5.5 on the toughest MathArena Apex 2025 problems by 13.5% and achieves perfect scores on six other top math competition benchmarks.
GPT-5.5LLM verificationSTAR-PólyaMath
0 likes · 15 min read
