Which Large Language Model Leads in Intelligence, Speed, and Cost? 2026 Rankings Revealed
The 2026 Artificial Analysis report ranks the top global large language models by intelligence score, token‑per‑second output speed, and cost per million tokens, highlighting the dominance of Gemini 3.1 Pro Preview and GPT‑5.4 in intelligence, NVIDIA Nemotron 3 Super in speed, and DeepSeek V3.2 and gpt‑oss‑120B as the most cost‑effective options.
Overview
The AI evaluation platform Artificial Analysis released a new global ranking of large language models (LLMs) covering three key dimensions: Intelligence Index, Output Tokens per Second (speed), and price in USD per million tokens. The ranking compares models from major providers such as Google, OpenAI, NVIDIA, Anthropic, as well as Chinese open‑source models.
Intelligence Index (Core Capability)
The Intelligence Index measures a model’s overall reasoning, knowledge, and creativity. The top scores are:
Gemini 3.1 Pro Preview (Google) : 57 points (tied for 1st)
GPT‑5.4 (xhigh, OpenAI) : 57 points (tied for 1st)
Claude Opus 4.6 (max, Anthropic) : 53 points
Claude Sonnet 4.6 (max, Anthropic) : 52 points
GLM‑5 (Zhipu AI) : 50 points – the highest among open‑source and Chinese models, ranking 5th globally
DeepSeek V3.2 : 42 points – moderate intelligence but excellent cost‑performance
Commentary: Gemini 3.1 Pro Preview and GPT‑5.4 remain at the intelligence ceiling, while GLM‑5 continues to improve, staying within the global top‑5, demonstrating that Chinese models can match the leading international tier.
Output Speed (Tokens per Second)
Speed directly affects the fluidity of chat, writing, and code generation. The fastest models are:
NVIDIA Nemotron 3 Super : 455 tokens/s (rank 1)
gpt‑oss‑120B (high) : 279 tokens/s (rank 2)
Grok 4.20 Beta 0309 : 216 tokens/s
Gemini 3 Flash : 166 tokens/s
Gemini 3.1 Pro Preview : 125 tokens/s
GPT‑5.4 (xhigh) : 73 tokens/s
GLM‑5 : 67 tokens/s
Claude Sonnet 4.6 (max) : 55 tokens/s
Claude Opus 4.6 (max) : 48 tokens/s
DeepSeek V3.2 : 28 tokens/s
Commentary: NVIDIA Nemotron 3 Super dominates speed, earning the “lightning” title. Grok 4.20 and Gemini Flash also offer strong performance for real‑time applications, while GLM‑5 provides moderate speed suitable for many use cases.
Price (USD per 1M Tokens)
Cost per million tokens influences large‑scale deployment budgets. The cheapest options are:
gpt‑oss‑120B : $0.30
DeepSeek V3.2 : $0.30 (tied for cheapest)
NVIDIA Nemotron 3 Super : $0.40
Gemini 3 Flash : $1.10
GLM‑5 : $1.60
Grok 4.20 Beta 0309 : $3.00
Gemini 3.1 Pro Preview : $4.50
GPT‑5.4 (xhigh) : $5.60
Claude Sonnet 4.6 (max) : $6.00
Claude Opus 4.6 (max) : $10.00
Commentary: DeepSeek V3.2 and gpt‑oss‑120B achieve the best price‑performance, while top‑tier models like Gemini, GPT‑5.4, and Claude Opus command higher prices suitable for scenarios demanding the highest intelligence.
Choosing the Right Model
Maximum intelligence (complex writing, reasoning) : Gemini 3.1 Pro Preview or GPT‑5.4 (57 points).
Maximum speed (real‑time chat, code generation) : NVIDIA Nemotron 3 Super (455 tokens/s).
Best cost‑performance (high‑frequency daily use, startups) : DeepSeek V3.2 or gpt‑oss‑120B ($0.30 per M tokens).
Best choice for Chinese users : GLM‑5 (top‑5 intelligence, reasonable price) and DeepSeek V3.2 (cheapest).
There is no absolute “best” model; the optimal choice depends on the specific trade‑off between intelligence, speed, and budget.
AI Info Trend
🌐 Stay on the AI frontier with daily curated news and deep analysis of industry trends. 🛠️ Recommend efficient AI tools to boost work performance. 📚 Offer clear AI tutorials for learners at every level. AI Info Trend, growing together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
