OpenAI’s GPT‑5.5‑Cyber Beats Mythos with 85.6% on CyberGym
OpenAI’s new GPT‑5.5‑Cyber model outperforms Anthropic’s Mythos on multiple security benchmarks, achieving 85.6% on CyberGym and 39.5% on ExploitGym, while the accompanying Daybreak initiative introduces the Codex Security plugin, Patch the Planet programme, and trusted‑access collaborations, prompting a shift in defensive priorities toward rapid patching.
1. Record‑breaking Benchmarks: Surpassing Mythos
GPT‑5.5‑Cyber, OpenAI’s model for advanced authorized cybersecurity tasks, outperforms the baseline GPT‑5.5 on three public benchmarks. On CyberGym (vulnerability discovery) it scores 85.6% versus Mythos’s 81.8%. On ExploitGym (exploitation verification) it reaches 39.5% versus 25.95%, a relative 52% gain. On SEC‑bench Pro (long‑term vulnerability hunting) it attains 69.8% versus 63.1%.
The security community is divided. Independent researcher Timothee Chauvin warns that CyberGym may be near saturation, questioning whether 85.6% reflects real‑world generalisation. Others point to the 39.5% ExploitGym figure, noting that nearly 40% of known bugs can be weaponised quickly, raising concerns about how long defenders can block such AI‑assisted attacks.
2. Daybreak Initiative: End‑to‑End Fix Cycle
GPT‑5.5‑Cyber is released as part of OpenAI’s broader “Daybreak” programme, which focuses on three pillars:
Codex Security plugin : Scanned over 30 million code commits across more than 30 000 repositories; human reviewers marked over 70 000 findings as fixed and 500 000+ as automatically resolved. The plugin performs deep repository scans, attack‑path tracing, threat‑model construction, and auto‑generates patches that are manually verified. It has already helped fix vulnerabilities in Firefox, V8, Safari, OpenBSD, FreeBSD and several HTTP/2 implementations.
Patch the Planet programme : Partners with Trail of Bits, HackerOne and the Calif research institute to fund security researchers and provide AI tools, collaborating directly with open‑source maintainers. In a five‑day sprint the first round reviewed hundreds of issues, merged dozens of patches, and attracted more than 30 projects such as cURL, Go, Python, Sigstore and pyca/cryptography.
Trusted Access for Cyber : Access to GPT‑5.5‑Cyber is limited to “verified defenders”. OpenAI is in ongoing dialogue with the U.S. government and has established collaborations with agencies in Australia, Canada, France, Germany, Japan, South Korea and the EU’s ENISA.
3. Blue‑Team Perspective: Faster Weaponisation, Closing the Defense Loop
OpenAI acknowledges that the historic bottleneck was vulnerability discovery, but now defenders are overwhelmed by the volume of discovered bugs; the new bottleneck is patching.
Codex Security is designed to place an equivalent security engineer beside each developer, enabling threat‑model understanding, reachable‑vulnerability identification, evidence collection, targeted patch creation and verification rather than merely generating alerts.
The 39.5% ExploitGym score implies a large share of known bugs can be turned into exploitable code quickly, challenging traditional triage and prioritisation processes.
Security operations teams should focus on three actions: (1) embed AI‑assisted code audit deeply into CI/CD pipelines to shorten discovery‑to‑remediation cycles; (2) recalibrate vulnerability severity models to incorporate AI‑driven weaponisation efficiency; (3) actively contribute to upstream open‑source security ecosystems to extend protective capabilities into critical supply‑chain components.
References :
Daybreak: Tools for securing every organization in the world – OpenAI, 2026‑06‑22
@IntCyberDigest tweet, 2026‑06‑22
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Black & White Path
We are the beacon of the cyber world, a stepping stone on the road to security.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
