Don't Vibe – Prove

The vibe coding era may be shorter than anyone expected.

Nicolas Grislain, a machine learning engineer, published an essay this week arguing that AI-assisted development has already outgrown its current paradigm — and that the natural successor isn't better test suites or more sophisticated rule files, but dependent types. Specifically Lean 4, a proof assistant that doubles as a functional programming language, where the compiler doesn't just check types but verifies mathematical correctness.

The argument lands differently if you think about it from an agent workflow angle. The standard vibe coding loop — prompt, generate, run tests, iterate — has a fundamental ceiling: the specification and the code are always separate things. A test suite is a partial account of what you meant. Lean 4 collapses that gap by making the type signature itself a theorem, grounded in the Curry-Howard correspondence, which maps types to propositions and programs to proofs. A function that compiles without the `sorry` escape hatch isn't just type-safe — it's proven correct. Full stop.

Grislain demonstrated this using Claude to generate insertion sort and merge sort in Lean 4, each satisfying the same type signature: a function that takes a list and returns a sorted list, with sortedness baked into the return type. Both compiled. Both were machine-verified. The insertion sort's proof is substantially longer than its algorithm, which Grislain treats as the point rather than the problem — every logical step was checked by the compiler, not eyeballed by a reviewer.

Where the essay gets genuinely interesting for anyone building agent pipelines is here: the proof-construction feedback loop — generate a candidate term, receive a type-checker rejection, iterate — is exactly the kind of adversarial, symbolic search task that agents handle well and humans handle badly. It's pattern matching over large combinatorial spaces under tight formal constraints. Humans lose patience. Agents don't. In that framing, Lean 4 isn't just a programming language — it's an oracle. The type-checker becomes a deterministic verifier an agent can query thousands of times without ambiguity. No flaky tests, no hallucinated pass rates. The compiler either accepts the proof or it doesn't.

The skepticism writes itself, though. Grislain's demos involve textbook algorithms with well-understood proof structures. Real production code — distributed systems, stateful APIs, anything touching IO — is a different proposition entirely. Formal verification of a sorting algorithm is a solved problem; formal verification of a payment processing flow is a research project. The Lean 4 community itself is small, its tooling immature by mainstream standards, and the question of whether current models can handle the kind of multi-step, compositional proofs that would matter in practice remains genuinely open. Cherry-picked demos have a way of flattering the thesis.

Still, the structural point holds independent of whether Lean 4 goes mainstream. If the feedback signal is tight and formal, agents iterate faster and more reliably than in any human-review loop. The compiler-as-oracle model is more tractable for agents precisely because it's unambiguous — and that's a real advantage over the status quo, whatever proof assistant eventually captures it.