Google Reclaims AI Throne with Gemini 3.1 Pro, Achieving 77.1% ARC‑AGI‑2 Score

Google’s Gemini 3.1 Pro, the latest upgrade to the Gemini 3 series, achieves a verified 77.1 % score on the ARC‑AGI‑2 reasoning benchmark—more than double the performance of Gemini 3 Pro—while leading in GPQA, LiveCodeBench Pro, SWE‑Bench Verified, and MMMLU tests, and is now being rolled out to developers, enterprises and consumers with detailed pricing and integration options.

Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Google Reclaims AI Throne with Gemini 3.1 Pro, Achieving 77.1% ARC‑AGI‑2 Score

Google announced a major update to its Gemini 3 Deep Think line, introducing Gemini 3.1 Pro as a more powerful core model designed to tackle complex scientific, research, and engineering problems.

Yao Shunyu, a researcher involved in the Gemini 3 Deep Think project, tweeted that even better models will continue to emerge.

In the ARC‑AGI‑2 benchmark, which evaluates a model’s ability to solve novel logical patterns, Gemini 3.1 Pro achieved a verified 77.1 % score, delivering more than twice the inference performance of Gemini 3 Pro.

Internal tests also show strong results across several domains:

Scientific Knowledge: 94.3 % on the GPQA diamond‑level test.

Coding: Elo 2887 on LiveCodeBench Pro and 80.6 % on SWE‑Bench Verified.

Multimodal Understanding: 92.6 % on the MMMLU test.

Third‑party evaluator Artificial Analysis ranked Gemini 3.1 Pro at the top of global AI model leaderboards, placing it 4.6 points ahead of Claude Opus 4.6 while costing less than half as much to run.

The model’s enhanced reasoning enables a range of applications, such as generating scalable SVG animations from text prompts, building real‑time aviation dashboards that ingest public telemetry, creating immersive 3D bird‑flocking simulations with generative music, and converting literary themes into modern, code‑driven personal portfolio websites.

Google is deploying Gemini 3.1 Pro across multiple channels: developers can access it via Google AI Studio’s Gemini API, Gemini CLI, the Antigravity agent‑development platform, and Android Studio; enterprises can use it through Vertex AI and Gemini Enterprise; consumers can reach it via the Gemini app and NotebookLM.

Pricing is tiered: input tokens up to 200 k cost $2.00 per million, beyond that $4.00; output tokens up to 200 k cost $12.00 per million, beyond that $18.00; context caching incurs $0.20–$0.40 per million tokens plus $4.50 per million tokens per hour for storage; grounding searches are free for the first 5 000 queries per month, then $14 per 1 000 queries.

Databricks CTO Hanlin Tang noted that Gemini 3.1 Pro achieved the best‑in‑class result on the OfficeQA benchmark for factual reasoning over tabular and unstructured data, while Cartwheel co‑founder Andrew Carr highlighted a significant improvement in the model’s understanding of 3D transformations, addressing long‑standing rotation‑order bugs in animation pipelines.

Google emphasizes that Gemini 3.1 Pro is currently a preview version, with plans to pursue further breakthroughs in autonomous workflows before a full public release.

Large Language ModelGoogle AIAI benchmarkingGemini 3.1 Promodel pricingARC-AGI-2
Machine Learning Algorithms & Natural Language Processing
Written by

Machine Learning Algorithms & Natural Language Processing

Focused on frontier AI technologies, empowering AI researchers' progress.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.