Vibe coding's credibility problem: from Karpathy's tweet to production incident

Andrej Karpathy invented 'vibe coding' in a February 2025 tweet, and the developer internet latched on immediately. The phrase fit. Prompt-driven development had changed what it felt like to write software — you describe what you want, the model generates something plausible, you iterate. Karpathy's framing, 'fully give in to the vibes, embrace exponentials, forget the code even exists,' captured the texture of that workflow better than anything else anyone had tried. Collins Dictionary named it 2025's Word of the Year.

Then came the incidents.

The problem wasn't the phrase — it was the scope creep. What started as shorthand for weekend prototypes and throwaway experiments gradually got applied to anything AI-assisted, including production code at companies with real customers and real financial exposure. An AWS outage and Moonwell's $1.8 million bad debt event, both traced back to AI-generated code that hadn't been adequately reviewed, gave the critics something concrete to point at.

This month, CodeRabbit published a retrospective by David Kravets that follows the term's arc from useful neologism to loaded liability. The piece draws on Fastly survey data showing around a third of senior engineers now ship code where roughly half the lines came from an AI — yet nearly 30 percent say auditing that output erases most of the time they saved generating it. That gap has a name now: the 'review tax.' CodeRabbit's own research adds another uncomfortable number: AI-generated code carries 1.7 times more bugs than code written by humans.

The senior engineers absorbing that review burden aren't getting credit for it. The productivity metrics that justified the AI tooling investment don't capture time spent debugging subtle problems in code you didn't write and don't fully understand.

Karpathy has since reframed toward 'agentic engineering' — a model built around structured oversight and verification loops rather than pure generative momentum. Whether that framing takes hold depends partly on whether the industry treats recent incidents as flukes or as evidence of something structural. CodeRabbit, predictably, is positioning automated code review as the answer: 'vibe checks' as quality gates before agent-generated code ships. It's a commercial argument dressed as a lesson, but that doesn't make the underlying problem less real.