How a Chinese Team Reclaimed the Top Spot on the AI Agent Leaderboard After the OpenAI Ranking Scandal
The article analyzes the MLE‑Bench benchmark, Baidu's Famou 2.0 agent achieving a new SOTA score, the controversy over Disarray's cheating, and real‑world deployments in automotive, banking, and aerospace, illustrating how Harness Engineering is becoming the decisive factor in AI agent performance.
