AI Writes Clean Code. That's Why Bugs Slip Through.

Teams adopting AI coding tools like Claude Code are shipping dramatically more code, but the review burden is crushing them. A Latent Space analysis confirms the pattern across the industry: high AI adoption correlates with 98% more merged PRs and 91% more time in review. Output doubled. Review time nearly doubled too. An economics paper by Catalini, Hui, and Wu calls this the "Trojan Horse" externality: deploying unverified systems becomes rational for each team even as systemic risk accumulates. And the code is actually harder to review. When humans write buggy code, they leave traces. Weird variable names, confused comments, awkward structure. AI writes clean, idiomatic, well-commented code that hides bugs in plain sight. Bram Cohen on Vibe Coding suggests that oversight is required. There's also a confidence gap at work. A frontend engineer asks Claude to write a database query, gets back something that looks correct, but lacks the expertise to verify it. A METR study from mid-2025 captured the illusion: developers thought AI made them 20% faster. Actual measurements showed they were 19% slower. Ray argues the fix requires verification infrastructure: test suites as ground truth, human-written acceptance criteria defined before AI touches any code, and adversarial agents that try to break what the coding agent built. That last piece matters most. AI-written tests share the same blind spots as AI-written code because the same agent produces both. They document the happy path the AI already envisioned. You need a separate verification process that challenges the code rather than confirming it. Teams that skip this now are building up quiet technical debt that compounds fast.