Mythos Meets Reality: Small Models Find Same Zero-Day Bugs

AISLE ran the specific Mythos vulnerabilities Anthropic highlighted on April 7 through cheap, open-weights models and found that eight out of eight detected Mythos's flagship FreeBSD exploit. One model had just 3.6 billion active parameters and costs $0.11 per million tokens. A 5.1B-parameter open model recovered the core chain of a 27-year-old OpenBSD bug. Anthropic announced Mythos alongside Project Glasswing, committing up to $100M in usage credits and framing it as a breakthrough for autonomous zero-day discovery. The implied message: you need frontier-scale intelligence for serious security work. AISLE's evidence suggests you don't.

They would know. AISLE has operated its own discovery and remediation pipeline since mid-2025, finding 15 CVEs in OpenSSL, 5 in curl, and over 180 externally validated CVEs across 30+ projects. Stanislav Fort at AISLE describes AI cybersecurity capability as "jagged". Rankings reshuffle completely across tasks. No single model consistently wins. AISLE is model-agnostic by design, and their pitch is blunt: a thousand adequate detectives searching everywhere beat one brilliant detective guessing where to look.

Critics on Hacker News pushed back on the methodology. Some pointed out that AISLE tested older model versions like Qwen3 32B and DeepSeek R1 when newer releases would likely score higher. The bigger complaint: AISLE pointed models directly at flawed code rather than testing autonomous discovery, which is the genuinely hard part. Let small models roam free to find bugs and you'd likely drown in false positives. The study may undersell what Mythos actually does end-to-end.

None of this kills AISLE's point. If cheap models handle much of the detection work, the real advantage is in how you orchestrate the whole thing. What code you look at. How you test your guesses. How you separate real bugs from noise. Whether maintainers trust you enough to accept your patches. AISLE's security analyzer now runs on OpenSSL and curl pull requests, catching bugs before they ship. The OpenSSL CTO praised their "high quality of the reports and constructive collaboration throughout the remediation." That's the full loop from discovery to accepted patch, and AISLE closed it without needing one proprietary model to do so.