Google Reclaims AI Crown with Gemini 3.1 Pro – Better Models Ahead

Google’s Gemini 3.1 Pro, the latest upgrade to its Gemini 3 series, achieves a verified 77.1% score on the ARC‑AGI‑2 reasoning benchmark—over twice the performance of Gemini 3 Pro—while also leading in GPQA, LiveCodeBench, SWE‑Bench and MMMLU tests, offering advanced code‑generation, multimodal and 3D capabilities at lower cost, and is being rolled out to developers, enterprises and consumers.

Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Google Reclaims AI Crown with Gemini 3.1 Pro – Better Models Ahead

Google announced Gemini 3.1 Pro, an upgraded core model for the Gemini 3 series, aimed at tackling complex problems that simple‑answer models cannot solve. The company claims the model delivers a significant leap in reasoning ability.

On the ARC‑AGI‑2 benchmark, which evaluates a model’s capacity to solve novel logical patterns, Gemini 3.1 Pro achieved a verified 77.1% score, more than double the reasoning performance of Gemini 3 Pro. Additional internal tests show strong results across several domains:

Scientific Knowledge: 94.3% on the GPQA diamond‑level test.

Code Generation: Elo 2887 on LiveCodeBench Pro and 80.6% on SWE‑Bench Verified.

Multimodal Understanding: 92.6% on the MMMLU test.

Third‑party evaluation by Artificial Analysis placed Gemini 3.1 Pro at the top of global AI model rankings, ahead of Claude Opus by 4.6 points while costing less than half as much to run.

The model’s pricing is tiered by token usage: prompts up to 200 k tokens cost $2.00 per million tokens, beyond that $4.00; outputs cost $12.00/$18.00 per million tokens respectively; context caching adds $0.20–$0.40 per million tokens plus $4.50 per million tokens per hour for storage; grounding searches are free for the first 5 000 queries per month, then $14 per 1 000 queries.

Gemini 3.1 Pro demonstrates several practical capabilities:

Code‑based Animation: Generates SVG animations from textual prompts, producing resolution‑independent, lightweight graphics.

Complex System Integration: Builds real‑time dashboards, such as an aviation instrument panel that streams live telemetry from the International Space Station.

Interactive 3D Design: Writes code for 3D flock‑simulation with gesture control and generative music, enabling immersive prototypes for researchers and designers.

Creative Coding: Transforms literary themes (e.g., Wuthering Heights) into modern, stylistic personal‑website designs that capture the novel’s atmosphere.

Google is deploying Gemini 3.1 Pro across consumer and developer products: preview access via Google AI Studio’s Gemini API, Gemini CLI, the Antigravity agent‑development platform, Android Studio, Vertex AI, Gemini Enterprise, and the Gemini app/NotebookLM for end users.

Partners have begun integrating the preview, reporting notable gains in reliability and efficiency. Experts such as Databricks CTO Hanlin Tang highlighted the model’s best‑in‑class results on the OfficeQA factual‑reasoning benchmark, while Cartwheel co‑founder Andrew Carr noted significant improvements in 3D transformation understanding.

Google emphasizes that Gemini 3.1 Pro is a preview version; further breakthroughs are planned for autonomous workflows, with a full public release expected soon.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Google AIAI benchmarkingGemini 3.1 ProARC-AGI-2
Machine Learning Algorithms & Natural Language Processing
Written by

Machine Learning Algorithms & Natural Language Processing

Focused on frontier AI technologies, empowering AI researchers' progress.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.