Anthropic's Claude Discovers 22 Firefox Vulnerabilities in Two Weeks

Anthropic announced a landmark collaboration with Mozilla in which its Claude Opus 4.6 model independently discovered 22 security vulnerabilities in the Firefox browser over just two weeks. Of those, Mozilla classified 14 as high-severity — representing nearly a fifth of all high-severity Firefox vulnerabilities remediated throughout 2025.

The findings mark a significant milestone in AI-assisted cybersecurity and raise new questions about how AI models will reshape vulnerability research.

How It Started

The collaboration grew out of Anthropic's internal benchmarking. In late 2025, the company noticed Claude Opus 4.5 was close to solving all tasks in CyberGym, a benchmark testing whether LLMs can reproduce known security vulnerabilities. Anthropic wanted a harder, more realistic test — and chose Firefox because of its complexity and status as one of the most well-tested open-source projects in the world.

The team first tasked Claude with reproducing previously known CVEs in older Firefox versions. When Claude succeeded at a high rate, Anthropic escalated to the real challenge: finding novel, never-before-reported vulnerabilities in the current Firefox codebase.

Twenty Minutes to the First Zero-Day

After just twenty minutes of exploration, Claude Opus 4.6 flagged a Use After Free vulnerability in Firefox's JavaScript engine — a memory corruption bug that could let attackers overwrite data with malicious content. Anthropic researchers validated the finding independently, then submitted it along with a Claude-written patch to Mozilla's Bugzilla issue tracker.

While the team was still validating and filing that first report, Claude had already found fifty more unique crashing inputs.

112 Reports, Bulk Submission

Mozilla encouraged Anthropic to submit findings in bulk. By the end of the effort, Claude had scanned nearly 6,000 C++ files and produced 112 unique vulnerability reports. Most of the issues were fixed in Firefox 148.0, with remaining patches scheduled for upcoming releases.

Firefox 148.0 shipped to hundreds of millions of users with fixes sourced directly from this AI-driven audit.

Can Claude Exploit What It Finds?

Anthropic also tested whether Claude could go beyond finding bugs to actually exploiting them — developing tools a hacker would use to execute malicious code.

After spending approximately $4,000 in API credits across several hundred test runs, Claude only managed to produce working exploits in two cases, and only in a testing environment with some security features intentionally removed. The takeaway: Claude is far better at finding vulnerabilities than exploiting them, and identifying bugs is an order of magnitude cheaper than creating exploits.

What This Means for the Industry

This collaboration establishes a model for how AI companies and open-source maintainers can work together on security. Mozilla researchers have since started experimenting with Claude for their own internal security work.

The results also feed into a broader pattern Anthropic recently documented: Claude finding over 500 zero-day vulnerabilities across well-tested open-source software. Browser-grade codebases were previously considered the hardest targets for automated vulnerability discovery.

For software teams everywhere, the message is clear: AI-powered security auditing is no longer theoretical. It's finding real, high-severity bugs in production code — fast.

🚀 Want to strengthen your codebase with AI-powered quality assurance? Noqta's QA team combines automated tooling with human expertise to catch what others miss.

What's Next

Anthropic says it plans to expand this kind of security collaboration to other major open-source projects. Mozilla's willingness to accept bulk AI-generated reports and share triage processes sets a precedent that other maintainers may follow.

As AI models grow more capable at identifying — and potentially exploiting — security flaws, the race between defense and offense accelerates. The Firefox collaboration suggests that, for now, defense has the upper hand.

💡 Building software that needs to be bulletproof? Talk to Noqta about AI-augmented code audits and security hardening for your projects.