Meituan Technology Team
Nov 27, 2025 · Artificial Intelligence
AMO‑Bench: A New High‑Difficulty, Original Math Reasoning Benchmark for LLMs
AMO‑Bench, released by Meituan's LongCat team, is a 50‑question, IMO‑level math reasoning benchmark that combines original, high‑difficulty problems with automated scoring, exposing the current limits of top large language models whose best accuracy hovers around 52 % and offering a more discriminative evaluation tool for future model improvements.
AI evaluationAMO-Benchbenchmark
0 likes · 12 min read
