2026 AI Model API Prices – DeepSeek V4 Flash Costs Only 1% of GPT‑5.5

The article provides a detailed April 2026 comparison of API pricing for six major AI model families—including DeepSeek, GLM‑5.1, Kimi, Claude, GPT‑5.5, and Gemini—covering official and proxy channels, context limits, discount periods, peak‑time surcharges, and practical selection recommendations for developers.

AI Engineer Programming
AI Engineer Programming
AI Engineer Programming
2026 AI Model API Prices – DeepSeek V4 Flash Costs Only 1% of GPT‑5.5

Overview

Data collected up to April 26 2026, expressed in USD per million tokens, from official documentation of DeepSeek, Anthropic, OpenAI, Google, Z.ai, Moonshot AI and from proxy services OpenRouter and SiliconFlow. Prices may change; GPT‑5.5 server‑side API is not yet open, pricing shown is the announced preview.

DeepSeek V4

Released April 24 2026 with two variants: V4‑Flash (284 B total, 13 B active) optimized for speed, and V4‑Pro (1.6 T total, 49 B active) optimized for performance. Both use a MoE architecture, support 1 M token context and up to 384 K output tokens, and are MIT‑licensed. Official API pricing is $0.14 input, $0.028 cache, $0.28 output per M tokens; OpenRouter offers the same rates. V4‑Pro enjoys a limited‑time 75 % discount until May 5 2026, reducing to $0.44 / $0.036 / $0.87.

GLM‑5.1 (Z.ai)

Open‑sourced April 7 2026. Model size 744 B total, 40 B active MoE, 200 K context, 128 K max output. Achieved 58.4 % on the SWE‑Bench Pro benchmark, the top open‑source score for April 2026. Official pricing: $1.40 input, $0.26 cache, $4.40 output per M tokens. OpenRouter lists $1.05 input and $3.50 output (no cache price). During peak hours (14:00–18:00 UTC) the price triples.

Kimi K2.6 (Moonshot AI)

Released April 20 2026. 1 T total, 32 B active MoE, multimodal, 256 K context, supports up to 300 concurrent sub‑agents. Scored 58.6 % on SWE‑Bench Pro, ranking first among open‑source models at release. Official Moonshot pricing: $0.60 input, $0.15 cache, $2.50 output per M tokens. OpenRouter charges $0.74 input and $4.66 output, so direct official API is strongly recommended.

Claude series (Anthropic)

Three tiers are offered: Opus 4.7 (flagship), Sonnet 4.6 (mid‑range), Haiku 4.5 (light). Context up to 1 M tokens (Haiku limited to 200 K). Cache‑input cost is about 10 % of standard input price, benefiting multi‑turn dialogues. Prices: Opus $5 input / $25 output, Sonnet $3 / $15, Haiku $1 / $5. Long‑context (>200 K) usage incurs higher rates.

GPT‑5.5 series (OpenAI)

GPT‑5.5 announced April 23 2026, currently only available inside ChatGPT and Codex; server‑side API is not yet open. Preview pricing is $5 input and $30 output per M tokens. GPT‑5.5 Pro preview is $30 / $180. Existing GPT‑5.4 models are live: GPT‑5.4 (272 K context) $2.50 / $15, GPT‑5.4 mini (400 K) $0.75 / $4.50, GPT‑5.4 nano (400 K) $0.20 / $1.25.

Gemini series (Google)

Flagship Gemini 3.1 Pro supports up to 2 M token context; prices double beyond 200 K tokens. Flash‑Lite is a lightweight, high‑throughput option at $0.25 input and $1.50 output for 1 M context. Other variants: Gemini 3.1 Pro $2 / $12 (200 K), Gemini 2.5 Flash $0.30 / $2.50 (1 M), Gemini 2.5 Pro $1.25 / $10 (200 K).

Cross‑model comparison

Sorting models by output price (per M tokens) shows DeepSeek V4‑Flash as the cheapest at $0.28, followed by Kimi K2.6 ($2.50), DeepSeek V4‑Pro ($3.48), GLM‑5.1 ($4.40), Gemini 3.1 Pro ($12), Claude Opus ($25) and the GPT‑5.5 preview ($30). Output‑to‑input price ratios range from 1× (DeepSeek) to about 107× (GPT‑5.5).

Selection guidance

Ultra‑low cost / high‑throughput batch workloads: DeepSeek V4‑Flash ($0.14 / $0.28) is ideal for massive agent calls, classification, summarisation, code‑completion and similar high‑frequency tasks.

Open‑source + strong coding ability: Kimi K2.6 (official) and GLM‑5.1 via OpenRouter both rank at the top of the SWE‑Bench Pro leaderboard for April 2026, offering far lower costs than closed‑source flagships. Avoid OpenRouter for K2.6 because its output price ($4.66) is much higher than Moonshot’s official $2.50.

High‑precision reasoning / long‑document analysis: Claude Opus 4.7 ($5 / $25) remains the leader for complex reasoning and precise long‑text retrieval where quality outweighs cost.

Balanced closed‑source options: Gemini 2.5 Flash ($0.30 / $2.50) and GPT‑5.4 mini ($0.75 / $4.50) provide competitive mid‑range pricing for closed‑source models.

Important notes: DeepSeek V4‑Pro’s discount expires May 5 2026 UTC; GPT‑5.5 server‑side API is not yet open and its price may change after launch; GLM‑5.1 peak‑time pricing triples the base rate; all prices are subject to revision, so refer to official documentation for the final figures.

This article was proofread and formatted using PI Agent with deepseek‑v4‑pro max mode.
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DeepSeekGeminiClaudeGLM-5.1GPT-5.5AI Model Pricing
AI Engineer Programming
Written by

AI Engineer Programming

In the AI era, defining problems is often more important than solving them; here we explore AI's contradictions, boundaries, and possibilities.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.