Anthropic's research institute has published internal figures showing how far AI has crept into building AI: as of May 2026, more than 80% of the code merged into its own codebase is written by Claude, up from low single digits before Claude Code launched in early 2025.

The sharper number is capability, not volume. On the most open-ended engineering tasks, the ones with no clear specification, Claude's session success rate climbed to 76% in May 2026, a 50 percentage point jump in six months. In one research experiment, AI agents recovered 97% of a defined performance gap over 800 cumulative hours and roughly $18,000 of compute; two human researchers managed 23% in a week. Anthropic flags the catch plainly: humans still choose which problems matter, and "research taste" is the gap between today's tools and a system that could design its own successor.

The institute argues the world should at least have the option to pause frontier development, while conceding that training runs are far easier to conceal than missile silos. That detection problem, not the willingness to stop, is the part nobody has solved.