AI Is Great at Writing Code, Terrible at Making Engineering Decisions

A March 2026 opinion piece from codebase auditing firm untangle.work argues that the real problem with AI-generated software is not code quality but what it calls a "decision vacuum" — the absence of architectural judgment that no amount of prompt engineering can substitute for. The piece contends that while AI coding tools genuinely excel at producing syntactically correct, functional implementations at speed, they are structurally incapable of resolving the questions that actually define a codebase: how modules should relate to each other, which patterns to apply consistently, where to draw boundaries between concerns, and what not to build at all. The authors illustrate the problem with a concrete scenario in which sequential AI prompts for authentication, a settings page, and an API integration each produce working code that handles shared concerns like user sessions in incompatible ways — not because of any deliberate decision, but because no decision was ever made.

The piece is most useful as a symptom of where the AI tooling market currently stands rather than as a neutral analysis. untangle.work occupies a defensible but precarious position: human architectural judgment is genuinely something that current LLMs cannot reliably provide, but the moment that changes, the core service proposition disappears. The post's closing call-to-action — directing readers toward untangle.work's own auditing services — was not lost on Hacker News, and that conflict of interest is worth keeping in mind when weighing its claims.

The article's core claim does align with a growing body of empirical evidence about AI-assisted development. GitClear's analysis of 150 million lines of code found a 41% increase in code churn — code reverted or rewritten within two weeks — correlated with AI assistance. A 2025 survey of nearly 800 developers found 95% report spending extra time correcting AI-generated code, with senior engineers absorbing most of that burden. Y Combinator reported that 25% of its Winter 2025 batch had codebases that were 95% or more AI-generated, creating a cohort of production systems with minimal human architectural oversight. Cases like major platform rewrites driven by accumulated tech debt illustrate why those figures helped drive a "vibe coding cleanup" consulting market that split through 2025 into human-led boutiques — untangle.work, ontologi.dev, Ulam Labs — commanding $200–400 per hour for architectural review, and automated tooling players like CodeRabbit, which raised a $60 million Series B at a $550 million valuation in September 2025, citing vibe coding explicitly as a demand driver.

The Hacker News reception was skeptical. The top-voted comment dismissed the piece as a thinly veiled advertisement, a criticism that is not entirely unfair. A more substantive thread from a commenter with aerospace and electrical engineering experience pushed back on the broader "plan mode" trend in AI coding tools, arguing that what developers call planning with LLMs bears little resemblance to formal engineering design processes — requirements derivation, trade studies, model-based systems engineering — that take years to execute in mature disciplines. That critique implicitly reinforces the article's central point even while challenging its framing: genuine architectural judgment is expensive and rigorous, and neither AI plan mode nor the typical developer using AI today reliably provides it.

The same content engines enabling vibe coding are also flooding the internet with consultancy blog posts, making credibility increasingly dependent on demonstrated case studies rather than thought leadership. For now, the data on code churn and AI-assisted development suggests demand for cleanup services — human-led or automated — is not going away.