Meituan Technology Team
Jul 17, 2025 · Artificial Intelligence
How OIBench & CoreCodeBench Expose the Real Coding Limits of LLMs
The Meituan‑M17 team and Shanghai Jiao Tong University introduced two new benchmarks, OIBench and CoreCodeBench, to more accurately evaluate large language models' algorithmic and engineering coding abilities, revealing a substantial gap between claimed performance and actual capability across a range of tasks and models.
LLM evaluationalgorithmic assessmentartificial intelligence
0 likes · 28 min read
