Jul 17, 2025 · Artificial Intelligence

How OIBench & CoreCodeBench Expose the Real Coding Limits of LLMs

The Meituan‑M17 team and Shanghai Jiao Tong University introduced two new benchmarks, OIBench and CoreCodeBench, to more accurately evaluate large language models' algorithmic and engineering coding abilities, revealing a substantial gap between claimed performance and actual capability across a range of tasks and models.

LLM evaluationalgorithmic assessmentartificial intelligence

0 likes · 28 min read

How OIBench & CoreCodeBench Expose the Real Coding Limits of LLMs