GPT-5.4 Unleashed: Native PC Control, Million-Token Context, 50% Token Savings

OpenAI launched GPT-5.4 Thinking and GPT-5.4 Pro, unifying reasoning, coding, computer operation and agent abilities in one model, adding a million‑token context window, cutting token usage by nearly half, and delivering benchmark gains that surpass previous versions and even human performance.

Node.js Tech Stack
Node.js Tech Stack
Node.js Tech Stack
GPT-5.4 Unleashed: Native PC Control, Million-Token Context, 50% Token Savings

OpenAI announced the simultaneous release of GPT-5.4 Thinking (model ID gpt-5.4) and GPT-5.4 Pro (model ID gpt-5.4-pro) across ChatGPT, the API and Codex, positioning the models for professional workloads.

Six core upgrades are highlighted:

First model with native computer operation : can interpret screenshots, issue mouse clicks and keyboard commands, and orchestrate cross‑application workflows.

Million‑token context window (up to 1 M tokens) for longer planning and verification.

Tool Search mechanism that reduces token consumption by 47% while keeping accuracy unchanged.

Most token‑efficient inference model, delivering up to 1.5× faster token output when the /fast mode is enabled.

Interruptible “thinking” with a preamble outline, allowing users to steer the model mid‑reasoning.

Significant hallucination reduction: 33% lower false‑statement rate and 18% lower overall error probability.

Benchmark highlights :

OSWorld (desktop operation) success rate 75.0%, far above GPT‑5.2 (47.3%) and surpassing human performance.

GDPval (knowledge work) 83.0% match or exceed industry professionals, versus 70.9% for GPT‑5.2.

BrowseComp (agent browsing) 89.3% for GPT‑5.4 Pro, a new record.

SWE‑Bench Pro (software engineering) 57.7% for GPT‑5.4, slightly higher than GPT‑5.3 Codex (56.8%).

Toolathlon (tool‑using agents) 54.6% for GPT‑5.4, up from 45.7% for GPT‑5.2.

External evaluations echo these results: Mercor’s CEO calls GPT‑5.4 “the best model we’ve tested,” noting top scores on the APEX‑Agents benchmark; Harvey’s AI‑legal platform reports a 91% score on the BigLaw Bench; Zapier’s CEO describes GPT‑5.4 xhigh as “the most persistent multi‑step tool‑using model.”

Pricing shows higher per‑token rates ($1.75/M input, $14/M output for GPT‑5.4; $2.50/M input, $15/M output for GPT‑5.4 Pro) but the token‑efficiency gains can lower total cost for many tasks.

Availability and transition :

ChatGPT Plus/Team/Pro users receive GPT‑5.4 Thinking via gradual rollout, replacing GPT‑5.2 Thinking.

Enterprise and Education customers can enable early access in the admin console.

GPT‑5.4 Pro is limited to Pro and Enterprise plans.

GPT‑5.2 Thinking will remain in the “Legacy Models” list for three months and retire on 2026‑06‑05.

API endpoints gpt-5.4 and gpt-5.4-pro are now live.

The release reshapes the competitive landscape: OpenAI consolidates reasoning, coding, computer operation and agent tools in one model; Anthropic’s Claude Opus 4.6 remains strong in agent browsing and scientific reasoning; Google’s Gemini 3.1 Pro shows gains in search and science. The intensified competition is expected to drive better, cheaper tools for developers and end users.

Overall recommendation: stay flexible and choose the model that best fits each specific task rather than committing to a single provider.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

benchmarkAI modelToken EfficiencyGPT-5.4agent capabilitiescomputer operation
Node.js Tech Stack
Written by

Node.js Tech Stack

Focused on sharing AI, programming, and overseas expansion

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.