21 Reasons AI Agents Love Gleam

Dave Rapin built the third version of Curling IO, his curling club management platform, almost entirely with AI coding agents. In a blog post published this week, he argues that experience convinced him Gleam — a statically typed functional language with a small community and limited training data — beats JavaScript and Python for agentic development, even though the agents write worse Gleam.

The counterintuitive case rests on feedback loops. Agents like Claude Code, OpenAI Codex, and Google Gemini work by generating code, running it, reading the error, and trying again. In dynamically typed languages, bugs can hide until a production deployment — or, as Rapin puts it, a 2am incident. Gleam's compiler catches them in seconds, with precise file-level errors that give agents a machine-readable list of exactly what to fix.

Gleam's type system also eliminates entire categories of bugs before the loop starts. There is no null in the language; optional values must be declared as Option(T) and handled explicitly for both Some and None cases. Rapin cites research that nil dereference errors appear in 70% of production environments, and points to the June 2025 Google Cloud outage — where a single null database field cascaded through the Service Control system into a multi-hour global failure — as the kind of incident Gleam's compiler would have caught before deployment.

The same logic applies when code changes. Add a field to a data structure or a new variant to a union type, and every construction and pattern-match site that touches it fails to compile. Agents get an explicit checklist rather than a silent runtime surprise discovered weeks later.

Rapin concedes the obvious objection: agents trained on far less Gleam than Python or TypeScript produce messier first drafts. He treats that as a fixed upfront cost that a faster correction loop more than offsets. The canonical `gleam format` tool helps too — it normalises all output without configuration, removing the whitespace inconsistencies that he says routinely derail LLM-generated code in indentation-sensitive languages like Slim.

The argument carries implications beyond Gleam. As AI agents move from autocomplete tools to primary authors of production software, the factors that governed language popularity — training data volume, developer mindshare — may matter less than the quality of a compiler's error signal. Rapin's conclusion is blunt: a language that tells you exactly what is wrong, instantly, beats one that tells you approximately what is wrong, eventually.