May 2, 2026 · Artificial Intelligence

Why GPT‑5.5 and Claude Opus 4.7 Score Below 1% on ARC‑AGI‑3 While Humans Achieve 100%

The ARC‑AGI‑3 benchmark shows that GPT‑5.5 (0.43%) and Claude Opus 4.7 (0.18%) fail to solve any of the 135 novel environments, whereas a six‑year‑old human solves them all, and the analysis attributes the gap to three concrete failure modes and differing compression abilities of the two models.

AI benchmarkARC-AGI-3Claude Opus 4.7

0 likes · 10 min read

Why GPT‑5.5 and Claude Opus 4.7 Score Below 1% on ARC‑AGI‑3 While Humans Achieve 100%

Lao Guo's Learning Space

Apr 1, 2026 · Artificial Intelligence

Humans Achieve 100% While Top AI Models Score Below 0.4% on ARC‑AGI‑3 Benchmark

In the ARC‑AGI‑3 test, 486 random humans solved all 150+ game‑based puzzles with a perfect 100% success rate in a median of 7.4 minutes, whereas leading models such as GPT‑5, Claude Opus 4.6, Gemini 3.1 Pro and Grok 4.20 managed at most 0.37%, exposing a stark gap in meta‑cognitive reasoning.

AGIARC-AGI-3Reinforcement Learning

0 likes · 9 min read

Humans Achieve 100% While Top AI Models Score Below 0.4% on ARC‑AGI‑3 Benchmark

Why GPT‑5.5 and Claude Opus 4.7 Score Below 1% on ARC‑AGI‑3 While Humans Achieve 100%

Humans Achieve 100% While Top AI Models Score Below 0.4% on ARC‑AGI‑3 Benchmark

Why GPT‑5.5 and Claude Opus 4.7 Score Below 1% on ARC‑AGI‑3 While Humans Achieve 100%