There's a pattern emerging in AI-accelerated codebases that nobody's put metrics on yet. The engineers with the shakiest instincts are shipping the most code. Michael Timbs has a name for it: the judgment-volume inversion. Before coding agents, output speed was a natural throttle on bad ideas. A developer who didn't fully understand what they were building would eventually slow down — confused, blocked, forced to reckon with the design. Agents remove that throttle. Whatever judgment is directing them, good or bad, now runs at scale.
Timbs, writing in an adaptation of a company-wide AI adoption talk, builds his case on a shelf of software theory the productivity-metrics crowd has largely set aside. The centrepiece is Peter Naur's 1985 essay 'Programming as Theory Building': the real product of software development is the programmer's mental model of the system, not the code. Code is the residue. If that's right, then optimising for code generation is optimising for the residue.
Fred Brooks is the obvious next stop. His essential/accidental complexity split holds that AI tools can dissolve the accidental friction in software — the boilerplate, the repetitive scaffolding — but the essential difficulty, the part that requires actually understanding what you're building, doesn't compress. Timbs layers in David Parnas on decomposition, Michael Polanyi on tacit knowledge, and Paul Ralph's Sensemaking-Coevolution-Implementation model. Different frameworks, same wall: design knowledge is embodied and emergent. You can't articulate it completely in a prompt, so you can't generate your way past the need for it.
The most damaging dynamic Timbs identifies isn't about speed — it's about signal loss. Awkward APIs, messy coupling, the friction of trying to implement something that doesn't quite fit — these aren't just annoyances. They're diagnostic. They tell an engineer, early and cheaply, that their mental model of the system is wrong. Agents absorb that friction. The wrong architectural decision propagates quietly, with no resistance, and because AI models use existing code as guidance for generating more, each degradation compounds the next. Timbs is careful not to frame this as a junior-versus-senior problem: experienced engineers working in unfamiliar domains are equally exposed. Understanding has always been a function of doing, not of typing speed.
There's a narrow category where Timbs thinks agents earn their keep: high-repetition, low-ambiguity tasks where the correct output is unambiguous. The structural mismatch is in everything else — system architecture, domain modelling, the work of deciding what abstractions should even exist. Tests, linters, and static analysis can enforce constraints on code quality. They cannot tell you whether the mental model behind the code is coherent. That's the dimension of software quality that matters most, fails most silently, and accumulates the most technical debt before anyone notices. The productivity numbers agents produce are real. The question is whether anyone is measuring the right thing.