Anthropic and Mozilla wrapped up a two-week security trial in early 2026 that put Claude Opus 4.6 to work hunting bugs in Firefox — and it found quite a few. Over the course of the engagement, Claude scanned nearly 6,000 C++ files and filed 112 unique vulnerability reports. Twenty-two were confirmed as real vulnerabilities; 14 of those were rated high-severity, which Anthropic says amounts to roughly a fifth of all high-severity Firefox flaws patched throughout all of 2025.
The project grew out of Anthropic's internal benchmarking work. After Claude Opus 4.5 began approaching the ceiling on CyberGym — a benchmark for reproducing known security vulnerabilities — the company wanted to test Claude against a real, hardened codebase. Mozilla's Firefox fit the bill. Claude started with the JavaScript engine, partly because of its large attack surface and its role in running untrusted web content. Twenty minutes in, it had already flagged a Use After Free bug — a memory flaw that can let attackers overwrite data with malicious content. By the time the first bug report landed in Mozilla's Bugzilla tracker, Claude had generated 50 more crashing inputs. Most confirmed issues ended up patched in Firefox 148.0.
Anthropic also ran a separate test to see whether Claude could turn its findings into working exploits. After several hundred attempts costing roughly $4,000 in API credits, it succeeded twice — and both times, the test environment had the browser's sandbox and other key security features disabled. Anthropic's read: AI is making vulnerability discovery faster and cheaper, but building a functional exploit is a harder problem. Defenders still have an edge, though the gap is narrowing.
Alongside the Firefox results, Anthropic published Coordinated Vulnerability Disclosure (CVD) principles aimed at guiding how AI researchers and software teams handle disclosures at scale. The company also announced Claude Code Security, a limited research preview that would bring the same kind of automated scanning to individual developers and open-source maintainers — extending the approach beyond one-off partnerships with large browser vendors.