Machine Heart
May 2, 2026 · Artificial Intelligence
Why GPT‑5.5 and Claude Opus 4.7 Score Below 1% on ARC‑AGI‑3 While Humans Achieve 100%
The ARC‑AGI‑3 benchmark shows that GPT‑5.5 (0.43%) and Claude Opus 4.7 (0.18%) fail to solve any of the 135 novel environments, whereas a six‑year‑old human solves them all, and the analysis attributes the gap to three concrete failure modes and differing compression abilities of the two models.
AI benchmarkARC-AGI-3Claude Opus 4.7
0 likes · 10 min read
