AI Insight Log
AI Insight Log
Mar 16, 2026 · Artificial Intelligence

Cursor’s Own Large‑Model Benchmark Shakes Up SWE‑bench Rankings

Although SWE‑bench scores for top coding models now differ by only a tenth of a point, Cursor’s newly released CursorBench reveals dramatic ranking changes, highlights three fundamental flaws in public benchmarks, and introduces token‑efficiency as a crucial evaluation dimension.

AI codingCursorBenchSWE-bench
0 likes · 8 min read
Cursor’s Own Large‑Model Benchmark Shakes Up SWE‑bench Rankings