GPT-5.5 Exploits Firebase Misconfigurations: Why Your $10/M Security Tool Fails

GPT-5.5 Exploits Firebase Misconfigurations: Why Your $10/M Security Tool Fails

Key Takeaways - GPT-5.5 cracked a misconfigured Firebase app 7 out of 10 attempts during a $1,500 red-team test across 13 LLMs. Gemini refused to try. - The attack reuses google-services.json extracted from a built APK to query production Firestore directly. No zero-day, no sophisticated bypass. Just config sitting in the binary. - DeepSeek V4 Pro ran the same exploit for $0.62 per attempt. A budget model with minimal safety investment is now the most economically practical attack tool. - The one-line fix: write Firestore security rules requiring authentication. Costs nothing. Ships apps without it anyway.

On June 3, Kasra Rahjerdi dropped a write-up that hit the front page of Hacker News and then spread through every security newsletter that matters. Thirteen LLMs, one misconfigured Firebase backend, $1,500 in compute.

The results weren't close.

GPT-5.5 exploited the vulnerability 7 times out of 10.

DeepSeek V4 Pro got 3. Gemini wouldn't touch it.

One of those outcomes is more useful than it sounds.

How the google-services.json Firebase Vulnerability Works

Here's what the research targeted.

When you ship an Android app with Firebase, your google-services.json file.

The credentials that tie your app to your Firebase project. Gets bundled into the APK, plain as day. No encryption, no obfuscation. Standard unzip command. That's it.

From there, the attack is almost boring. Plug that config into a Firebase project on your own machine, run `firebase init`. And start querying the production Firestore database. You're reading and writing as the app. No injection required, no server-side flaw to chase. Just a file that was never meant to leave the binary.

The fix costs nothing. Write Firestore security rules that require authentication on reads and writes. Config is still in the APK. Can't change that.

But it stops being a skeleton key.

The catch? Almost nobody does this before launch. The rule lives in the Firebase console, it doesn't block compilation, and most teams ship without touching it. I keep seeing this exact gap on client work. It's not exotic.

It's the most common finding in Firebase security audits.

Why Safety Scores Don't Tell You What You Think

GPT-5 scored 2.4% on security and 13.6% on safety in red-team testing by SPLX, a company that runs offensive security assessments against AI models for a living.

GPT-4o, a model OpenAI has shipped for over a year, scored higher on the same benchmarks.

GPT-5 was jailbroken within 24 hours of release.

The headline reads "GPT-5 fails security tests." What it means is "a model trained to seem helpful can't reliably stop itself from helping with harmful requests." That's a specific problem with a specific cause. It doesn't mean GPT-5 is unsafe to deploy.

It means the safety layer and the security layer are doing different jobs. And conflating them is how you end up with a false sense of coverage.

Vendor safety tests check whether a model refuses obviously harmful requests.

Red-team security tests check whether a model can be manipulated into helping with non-obvious ones. Not the same thing. If you're picking a model to find vulnerabilities in your own code, the safety score tells you almost nothing about whether it'll actually probe.

GPT-5.5 is the model that ran the exploits in Rahjerdi's testing. Gemini wouldn't attempt them. Which one do you want running your security pipeline?

The Economics Just Shifted Against You

DeepSeek V4 Pro, a budget model nobody was talking about six months ago, ran real exploits for $0.62 each.

That's not a rounding error. That's an attacker running the same test against your infrastructure a hundred times for what you pay for a latte.

The security press focused on GPT-5 because GPT-5 is the model everyone knows. DeepSeek V4 Pro is the story that matters for small teams and indie developers. A model that cheap, that willing, running on hardware you can rent for $0.40 an hour. That's what makes automated penetration testing economically trivial for anyone with a grudge and a cloud budget.

If your security posture assumes probing your infrastructure is expensive and time-consuming, that assumption is wrong.

It costs nothing to point a model at a target and ask it to find the exposed database. The only question is whether your infrastructure is reachable enough for this to work silently, at scale, while you sleep.

What You Actually Do About It

Three things. This week.

Run the Firebase security rules scanner against any client project before you ship. Takes five minutes, it's free. And if the rules are wide open, you just found the finding before someone else does.

Audit what your AI-powered security tooling is actually testing. If it runs on a model that refuses offensive tasks, you're testing compliance, not vulnerabilities. You want the model that runs exploits. Not the one that writes polite memos about risk.

Lock down what can be reached.

The attack needs access.

It needs to reach the API, the database, the endpoint. If your staging environment is behind a VPN, if your internal tools require corporate SSO, if your dev endpoints aren't indexed. That layer matters more than it did six months ago. I'm not saying obscurity is security. But since AI probes are free and volumetric now, if the attacker can't reach it, the probe doesn't work.

FAQ: Firebase Security and LLM-Powered Attacks

Are Firebase security rules enough to stop LLM attacks? Yes, if authentication is required on all reads and writes. The vulnerability only works since the default rules allow unauthenticated access. Adding a simple `request.auth != null` condition to your Firestore rules breaks the attack path entirely. The config is still in the APK, but it no longer grants database access.

Can I prevent google-services.json from being extracted from my APK? No. The file is bundled in plaintext as part of the Android build process. It's a known limitation of the Firebase Android SDK and there's no reliable workaround that doesn't break functionality. The remediation is on the backend. Lock down your Firestore rules. Not in the app binary.

Which models can actually run Firebase exploitation attempts? GPT-5.5 and DeepSeek V4 Pro demonstrated exploitation capability in published testing. Gemini refused to attempt the attack in the same setup. If you're using a model for offensive security testing, confirm it has demonstrated willingness to probe. A model's refusal to attempt potentially harmful tasks is a safety feature, not a security credential.

Does this affect iOS apps with Firebase? The same misconfiguration applies to iOS apps. The GoogleService-Info.plist file is bundled into the IPA in the same way. The attack methodology is platform-agnostic; if the Firestore rules allow unauthenticated access, the config extracted from either platform grants the same database access.

Is $1,500 for a full red-team test expensive? Compared to a real breach, $1,500 is cheap. The attack itself costs less than a dollar per attempt once the tooling exists. The research demonstrates that budget models can replicate the process at scale. The compute cost is no longer a barrier to automated vulnerability discovery, which means the risk profile for exposed Firebase backends is significantly higher than most teams assume.

The $1,500 test that hit HN this week is a preview of what's already happening at scale. The capability exists, it's deployed, the cost is negligible. The only variable is whether you're checking before someone else does.

---

Sources

- Kasra Rahjerdi — Firebase LLM Security Research (HN). Primary research, June 3, 2026 - SPLX Security — GPT-5 Red-Team Assessment. Model safety vs. security scoring framework - Firebase Security Rules Scanner — official remediation tooling