Gemini 3.1 Pro Doubles Reasoning Scores, Beats Claude and GPT on ARC‑AGI‑2
Google’s Gemini 3.1 Pro achieves a 148% jump to 77.1% on the ARC‑AGI‑2 benchmark, scores a perfect 100% on AIME 2025, outperforms Claude Opus 4.6 and GPT‑5.2 on abstract reasoning, while offering 1 M‑token context, real‑time code demos, and immediate platform rollout.
Benchmark Highlights
Google DeepMind released official numbers showing Gemini 3.1 Pro reaches 77.1% on ARC‑AGI‑2, a 148% increase over the previous 31.1% version. It also scores 100% on the AIME 2025 mathematics competition, 80.6% on SWE‑Bench Verified, 2887 Elo on LiveCodeBench Pro, and 94.3% on GPQA Diamond.
Comparison with Competitors
Against Claude Opus 4.6 (68.8% on ARC‑AGI‑2) and GPT‑5.2 (52.9%), Gemini 3.1 Pro leads in abstract reasoning. On the “Humanity’s Last Exam” benchmark Claude slightly edges Gemini (53.1% vs 51.4%). Code ability is within 1% across models, while Gemini leads terminal programming by about 3 points.
Direct Visual Comparison
Google posted a side‑by‑side video showing faster response and higher quality from 3.1 Pro versus 3.0 on the same tasks.
Practical Demos
City Planner Application
The model generated a full city‑planning app handling complex terrain, road and infrastructure layout, traffic flow simulation, and 3D visualization, integrating constraints and multi‑objective optimization without requiring the user to write code.
3‑D Flocking Simulation
A demo of starling flocking produced code implementing boids algorithms, hand‑gesture interaction, dynamic audio, and 3‑D rendering, illustrating “creative programming” beyond business logic.
Real‑time Code Verification
During a code‑review session, Gemini 3.1 Pro detected uncertainty about a SQL query and automatically launched a PostgreSQL container to test it, demonstrating a “try‑and‑verify” reasoning style.
Long‑Context Performance
While the model supports up to 1 M tokens input and 64 k tokens output, measured performance drops sharply: 128 k context yields 84.9% accuracy, but 1 M context falls to 26.3%, indicating current limits for ultra‑long texts.
Availability
Gemini 3.1 Pro is already live on Gemini App (free trial), Google AI Studio, Vertex AI, and NotebookLM (Pro/Ultra subscription). API pricing matches the previous Gemini 3 Pro.
References
Google DeepMind model page: https://deepmind.google/models/gemini/pro/
Sundar Pichai announcement: https://x.com/sundarpichai/status/2024516418855981298
Google tweet: https://x.com/google/status/2024519455389192204
3.0 vs 3.1 comparison video: https://x.com/google/status/2024598510499426433
Gemini API documentation: https://ai.google.dev/gemini-api/docs/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ShiZhen AI
Tech blogger with over 10 years of experience at leading tech firms, AI efficiency and delivery expert focusing on AI productivity. Covers tech gadgets, AI-driven efficiency, and leisure— AI leisure community. 🛰 szzdzhp001
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
