GPT-5.5 Launch: A New Agentic AI for Real‑World Work

OpenAI’s GPT‑5.5, now available via API, claims agentic capabilities that let it autonomously plan, execute, and verify complex programming, knowledge‑work, and scientific tasks while matching GPT‑5.4 latency, delivering higher benchmark scores, stronger security controls, and a tiered pricing model.

JavaEdge
JavaEdge
JavaEdge
GPT-5.5 Launch: A New Agentic AI for Real‑World Work

Preface

Updated 2026‑04‑24. GPT‑5.5 and GPT‑5.5 Pro are now available via the API with additional safety controls.

The model improves goal understanding, autonomous task planning, tool use, result verification, and uncertainty handling, reducing the need for step‑by‑step prompting.

OpenAI evaluated GPT‑5.5 with internal and external red‑team testing and feedback from roughly 200 trusted early users.

Model capabilities

Agentic programming

GPT‑5.5 achieves state‑of‑the‑art results on three programming benchmarks while using fewer tokens than GPT‑5.4.

Terminal‑Bench 2.0 : 82.7 % accuracy (GPT‑5.5) vs 75.1 % (GPT‑5.4).

SWE‑Bench Pro : 58.6 % (GPT‑5.5) – the highest reported score.

Expert‑SWE (internal) : 73.1 % (GPT‑5.5) vs 68.5 % (GPT‑5.4).

Across these tests GPT‑5.5 consumes fewer tokens and fewer retries.

In Codex, the model can perform end‑to‑end engineering tasks such as implementation, refactoring, debugging, testing, and verification. Early testers observed stronger system‑structure understanding, better error diagnosis, and the ability to propagate changes across large codebases.

Knowledge work

GPT‑5.5’s programming strengths extend to office workflows. It more reliably interprets intent, retrieves information, extracts key points, invokes tools, verifies results, and produces final deliverables.

In Codex it outperforms GPT‑5.4 on document, spreadsheet, and presentation generation, and excels at converting chaotic inputs into structured plans. Integrated computer‑operation abilities enable screen content recognition, clicking, typing, and switching between tools.

Internal adoption data: over 85 % of OpenAI staff use Codex weekly, cutting tax‑document processing time by two weeks and saving 5‑10 hours per week on report generation.

Scientific research

GPT‑5.5 leads on GeneBench and BixBench and has contributed to new mathematical proofs (Ramsey numbers) verified with Lean.

Researchers use the model as a “research partner” for paper review, design analysis, and multi‑turn reasoning, converting expert ideas into tools and results.

Inference efficiency

To retain GPT‑5.4 token latency while boosting performance, OpenAI redesigned the inference stack and collaborated closely with NVIDIA hardware. Codex and GPT‑5.5 themselves participate in traffic analysis and load‑balancing improvements, raising generation speed by more than 20 %.

Security enhancements

GPT‑5.5 adds stricter safety controls, extensive red‑team testing for advanced cyber‑capabilities, and a “trusted‑access” mechanism for legitimate defensive use. The model is classified as a “high‑risk capability” but remains below the “critical” level.

Availability and pricing

GPT‑5.5 is available in ChatGPT and Codex for Plus, Pro, Business, and Enterprise tiers. GPT‑5.5 Pro targets Pro, Business, and Enterprise users.

API pricing:

Input: $5 per million tokens

Output: $30 per million tokens

GPT‑5.5 Pro pricing:

Input: $30 per million tokens

Output: $180 per million tokens

Despite higher per‑token rates, the efficiency gains make overall cost favorable.

Code example

http://www.javaedge.cn/
securitybenchmarkAgentic AIcodingknowledge workGPT-5.5
JavaEdge
Written by

JavaEdge

First‑line development experience at multiple leading tech firms; now a software architect at a Shanghai state‑owned enterprise and founder of Programming Yanxuan. Nearly 300k followers online; expertise in distributed system design, AIGC application development, and quantitative finance investing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.