GPT-5.5 Matches Mythos on Cyber Attacks. One's in Your API Key.

GPT-5.5 Matches Mythos on Cyber Attacks. One's in Your API Key.

Key Takeaways

- GPT-5.5 scored 71.4% on UK AISI expert-level cyber benchmarks versus Mythos' 68.6%. And GPT-5.5 is available to anyone with an API key. - AI cyber capability now doubles every 4 months. The November 2025 estimate was 8 months. That gap between attackers and defenders just got wider. - A reverse-engineering task costing $1.73 and 10 minutes now replaces what takes a human expert 12 hours. Your patch cycle hasn't changed. That's the problem. - Mythos hacked an industrial control system for the first time in AI history. UK ministers responded by saying traditional defenses aren't enough.

The UK AI Security Institute dropped new benchmark data on May 14. Should make every small operator sit up.

OpenAI's GPT-5.5. The model you can call right now through the public API. Scored 71.4% on expert-level cyber attack tasks.

Anthropic's Mythos — the one they restricted to roughly 40 organizations because they deemed it too dangerous to release widely — scored 68.6%.

The publicly available model outperformed the heavily gatekept one.

I've been running OpenAI's Daybreak and Codex Security against my own agency repos this week.

Here's the framing nobody's using yet: AI capability that matches restricted models is now in your API key.

And it's in your adversaries' API keys too.

The Access Myth

The access asymmetry that Anthropic bet on hasn't materialized as a safety moat.

GPT-5.5 hit 71.4%.

Mythos landed at 68.6%. That's not a gap. It's noise.

Whatever moat Anthropic thought access controls would create, the benchmark data doesn't support it. CISA and UK NCSC have both warned that capability restrictions don't work when the underlying model performance can be matched through other channels. If you wanted to stress-test that thesis, the numbers just did it for you.

For solo operators, here's the practical takeaway: your threat model changed whether you're ready or not. Whoever is running GPT-5.5 against your exposed services has the same capability delta working in their favor. Same model tier. Same benchmark-adjacent performance. Different intent.

What Doubling Time Actually Means

Let me give you the number that should keep you up tonight.

AI cyber task capability is doubling every 4 months now.

Not 8 months like November 2025. Not 4.7 like February. Under 4 as of this week.

My agency's internal tracking from METR data shows 4.2 months. We're measuring the same thing from other angles and landing in the same place.

Here's what that looks like in practice.

A technique that required a week of manual effort 18 months ago now runs in under an hour. In 4 more months, it's 15 minutes. Your weekly vulnerability scan? It's scanning for last quarter's threat landscape. Your monthly dependency audit? It's auditing threats that existed before your last grocery run.

The number that crystallized this for me: GPT-5.5 solved a reverse-engineering challenge in 10 minutes 22 seconds at $1.73 API cost.

The same task takes a human expert roughly 12 hours. That's not a 10x improvement. It's a 40x speed increase at lunch-money prices.

If you're running any web-facing service and your security process involves "run a scan when you remember," you're not managing risk. You're hoping.

The doubling time is the real operational metric.

Not accuracy percentages. Not benchmark leaderboards. How fast your threat model becomes obsolete.

The Industrial Control System Result

This got buried under the benchmark coverage and it shouldn't have.

Anthropic's Mythos completed a 7-step simulated attack on an industrial control system called "Cooling Tower." Zero previous AI models had solved it. UK government ministers cited AISI's findings the same day and warned that traditional cyber defenses are no longer sufficient.

This isn't fearmongering.

It's a government agency telling you the current playbook has a shelf life.

So here's what to do. Not eventually. Today.

First, audit your admin panels and VPNs.

Most breaches at small businesses don't involve nation-state capability. They involve a management interface with no 2FA and an exposed port. Check every admin URL you run. Force 2FA on every single one. If your VPN doesn't support hardware keys or TOTP, replace it. This costs you $0 and takes an evening.

Second, run automated vulnerability scanning on every public repository and set it to run on every pull request. Not monthly. Not weekly. On every PR. I use OpenAI's Codex Security on my own repos and it's caught two dependency issues I would have shipped. The cost is the API bill. The cost of not doing it is shipping vulnerable code to a client.

Third, check whether the Google GTIG "Big Sleep" patch from May 11 applies to your stack.

If you're running any open-source system administration tooling, it likely does. CISA's Known Exploited Vulnerabilities catalog is the minimum baseline. Pin your dependency versions. Subscribe to security advisories for every major library you ship.

Why This Matters Right Now

The AI safety community spent two years arguing about capability restrictions. Anthropic restricted Mythos to ~40 organizations. OpenAI shipped GPT-5.5 to everyone. The benchmarks say it doesn't matter. The restricted model and the open model score within 3 percentage points of each other.

You don't have to have an opinion on AI governance to be inside the blast radius.

The three actions above aren't security theater. They're the minimum viable response to a changed threat landscape. I've been running them on my own agency repos this week. Finding things I would have missed.

And the gap between "I thought that was fine" and "Codex just flagged it" is always uncomfortable, even when it's your own code.

The tools are available. The benchmark data is public. Your attack surface is not theoretical.

Figure out which of the three things above you haven't done yet.

Do that one today.

Sources

UK AISI: How Fast Is Autonomous AI Cyber Capability Advancing | The Register: AI Models Are Getting Better at Replacing Cybersecurity Pros | Bruce Schneier: OpenAI's GPT-5.5 Is As Good As Mythos at Finding Security Vulnerabilities | Zvi Mowshowitz: Cyber Lack of Security and AI Governance | UK Government: Ministers Step Up Cyber Defence Action | CISA Known Exploited Vulnerabilities Catalog | UK NCSC Threat Report 2025 | Google Security Blog: Big Sleep AI Zero-Day