Artificial Intelligence 9 min read

GPT-5.5 Launches: How It Stacks Up Against Claude Opus 4.7

OpenAI released GPT-5.5 with three variants, matching GPT-5.4's latency while boosting benchmark scores across Terminal‑Bench, GDPval, FrontierMath, ARC‑AGI‑2 and more, yet pricing doubles and some tests still favor Claude Opus 4.7, highlighting a fierce model‑level competition.

AI Programming Lab

Apr 24, 2026

GPT-5.5 Launches: How It Stacks Up Against Claude Opus 4.7

OpenAI announced the midnight release of GPT-5.5, adding three versions—GPT-5.5, GPT-5.5 Thinking, and GPT-5.5 Pro—available to all ChatGPT Plus, Pro, Business, and Enterprise users. The rollout follows a rapid cadence of roughly two months per version since GPT‑5 debuted.

OpenAI positions GPT-5.5 as an "agentic" model that can complete multi‑step tasks autonomously, emphasizing tool use, self‑checking, and continuous operation. The key claim is that per‑token latency matches GPT‑5.4 in real‑world serving, so the model is smarter without being slower and consumes fewer tokens.

Benchmark results back the claim. On Terminal‑Bench 2.0 (command‑line workflow), GPT‑5.5 scores 82.7 % versus 75.1 % for GPT‑5.4, 69.4 % for Claude Opus 4.7, and 68.5 % for Gemini 3.1 Pro. GDPval (44 professional domains) shows GPT‑5.5 at 84.9 % against 80.3 % for Claude and 67.3 % for Gemini.

On FrontierMath Tier 4 (hardest math), GPT‑5.5 reaches 35.4 % while GPT‑5.4 lags at 27.1 % and Claude at 22.9 %; the Pro variant climbs to 39.6 %. ARC‑AGI‑2 (abstract reasoning) jumps to 85.0 % from 73.3 % (GPT‑5.4). CyberGym (cyber‑security) scores 81.8 % versus 73.1 % for Claude. Tau2‑bench (telecom workflow) hits 98.0 % without prompt‑tuning, up from 92.8 % for GPT‑5.4.

Exceptions appear: Claude Opus 4.7 still leads on SWE‑Bench Pro (64.3 % vs 58.6 % for GPT‑5.5), though OpenAI flags a memorization issue for the benchmark. Similarly, Claude outperforms GPT‑5.5 on MCP Atlas (79.1 % vs 75.3 %).

Pricing doubles: the API costs $5 per M input tokens and $30 per M output tokens, compared with $2.5/$15 for GPT‑5.4. GPT‑5.5 Pro follows the same $30/$180 structure as the previous Pro tier. Context windows expand to 400 K tokens in Codex, with API context promised to reach 1 M tokens.

Codex receives five capability upgrades: (1) Browser automation now can click, fill forms, screenshot, and iterate; (2) Document generation works directly in Microsoft Office and Google Drive with an integrated previewer; (3) Computer‑Use enhancements enable screen reading, clicking, typing, and cross‑app context passing; (4) gpt‑image‑2 is integrated for on‑the‑fly illustration generation; (5) Fast mode in Codex offers 1.5× speed at 2.5× cost.

User feedback highlights the model’s practical impact. Dan Shipper (CEO of Every) calls GPT‑5.5 “the first coding model with serious conceptual clarity,” noting it reproduced a developer’s refactor solution that GPT‑5.4 missed. Cursor CEO Michael Truell observes that GPT‑5.5 stays on task significantly longer, a crucial advantage for agentic workflows. MagicPath CEO Pietro Schirano reports merging a large, heavily‑changed branch in about 20 minutes, describing the experience as working with a “higher intelligence.” An NVIDIA engineer likens losing access to GPT‑5.5 to “having a limb amputated.”

The author switched his Codex primary model to GPT‑5.5 and plans further hands‑on testing of auto‑review and browser automation before publishing a detailed follow‑up.

OpenAI’s rapid GPT‑5.5 rollout puts pressure on Anthropic’s Claude, suggesting the short‑term model arms race will continue and the ultimate “winner” remains uncertain.