How Quint and LLMs Compressed Months of Consensus Engineering Into a Week

When Informal Systems set out to implement Fast Tendermint — a BFT consensus variant that trades a higher node requirement for a potential one-round performance gain — the engineering estimate was several months. The team finished in about a week.

The shortcut wasn't a faster programmer or a better AI code generator. It was a workflow that treats LLMs as translators rather than authors, with a formal specification language called Quint doing the actual work on correctness.

The four steps are straightforward in principle, less obvious in practice. An LLM converts an English protocol description into a Quint spec change. Engineers then interrogate that spec directly — running Quint's simulator and model checker to probe reachability, check properties, and stress-test Byzantine fault assumptions — before touching any code. Once the spec survives that process, the LLM generates implementation code from it. Model-based testing then confirms that the running code and the spec produce identical behavior under identical scenarios, carrying formal confidence down to the production artifact.

What makes the workflow worth paying attention to is step two. Two bugs surfaced in the original English protocol description during spec validation — not in the code, because there was no code yet. Finding protocol-level mistakes before implementation is what formal methods have always promised. The cost is what usually kills it.

The test case is Malachite, a production BFT consensus engine now owned by Circle and deployed in the Arc blockchain. Fast Tendermint accepts a 5F+1 fault tolerance floor instead of the classical 3F+1, taking on a higher node count for the chance to shave a communication round. It's a meaningful change to a system where subtle bugs carry real consequences — which is presumably why it had been sitting on the roadmap.

Informal Systems' case isn't that LLMs make formal methods obsolete. It's that formal methods have always been worth doing but routinely skipped because the translation labor — English intent into machine-checkable spec — was too expensive to justify. LLMs reduce that cost enough to change the arithmetic. One case study doesn't prove the pattern generalizes, but the numbers here are hard to argue with.