A 15-month study from developer experience firm DX has put hard numbers on a gap that many engineering leaders have privately acknowledged: AI tool adoption is running well ahead of the productivity gains it was supposed to deliver.

Analyzing data from 40 companies between November 2024 and February 2026, DX found that despite a 65% average increase in AI tool usage, pull request throughput rose by just 9.97%. The figure accounts for a meaningful methodological choice: DX excluded teams that had set individual PR throughput targets, removing the gamification effects that tend to inflate self-reported metrics.

Most organizations in the study landed in the 8–12% productivity gain range — measurable, but a long way from the 2–3x improvements that vendor benchmarks have led boards and executives to expect. The explanation DX surfaces is consistent across developer interviews conducted for the study by Justin Reock: coding was rarely the bottleneck to begin with. A task that previously took four days might now take three, but that compression doesn't compound across the full delivery cycle. Planning, scoping, code review, and cross-team handoffs — the coordination work that makes up a substantial share of engineering time — have not been meaningfully accelerated by current AI tooling.

That conclusion tracks with independent research from André Meyer at FlowLabs, who interviewed engineering organizations in Switzerland and reached similar findings. Meyer's analysis points to a methodological problem with the headline figures that circulate from vendors: the widely-cited 30–60% productivity gains typically come from isolated task-completion studies — GitHub Copilot's benchmark of 55.8% faster code completion being a prominent example — that measure speed on a single activity rather than throughput across the full development cycle. Faster code generation, in practice, tends to redistribute review and validation work downstream rather than eliminate it. The 2024 DORA report adds a further complication, recording a 7.2% drop in delivery stability during the AI adoption period — suggesting that speed gains in one phase can introduce quality pressure elsewhere.

A BCG analysis referenced in FlowLabs' published research found that only 26% of companies generate tangible value from AI at scale. The gap between adoption rates and business outcomes points to a structural problem: the tooling and workflows required to extend AI's reach beyond code generation — into planning, review, and cross-team coordination — are still largely underdeveloped. That's where the next round of productivity arguments will be made, and where the evidence so far is thinnest.