Simon Willison on Agentic Engineering: TDD, Prompt Injection, and the Lethal Trifecta

Simon Willison, creator of the Datasette web framework and one of the most widely-read independent voices on AI development, delivered a fireside chat at the Pragmatic Summit in San Francisco in February 2026, moderated by Statsig's Eric Lui. The session is now on YouTube.

Willison traced the adoption arc he sees among developers: from using ChatGPT as a question-answering tool, to agents writing the majority of code, to the emerging frontier where developers neither write nor read the code being produced. He cited StrongDM, a security company, as a case study of that final stage. Their "software factory" approach, where nobody writes or reads any code, struck him as "wildly irresponsible" — and yet it appears to be working in practice, which he said warrants close attention.

A central theme was how Willison has restructured his own workflow around trusted agentic output. He credited Claude Opus 4.5 as the first model to earn his genuine professional trust, comparing it to relying on an external team's API without auditing their source code. He has become an advocate for red-green test-driven development with agents, despite personally disliking the practice throughout his career. His argument: agents absorb the tedium of writing failing tests first, and tests are now "effectively free" and therefore non-optional. He also introduced Showboat, a tool he built that instructs agents to document their own manual testing sessions as structured markdown logs of curl commands and outputs — a way to capture real server-level behavior that unit tests miss.

Willison also described two concepts he has started applying in practice. The first is "conformance-driven development": rather than reading a specification, he had Claude build parallel implementations of multipart file upload support across several web frameworks including Go, Node.js, Django, and Starlette, then used consistency across those implementations to reverse-engineer a reliable standard, which became the test suite for his own Datasette implementation. The second is the "lethal trifecta" — the combination of prompt injection attacks, agent autonomy, and access to consequential actions. He advocates sandboxing agents specifically to limit this attack surface.

The security argument is where Willison's position is hardest to dismiss. LLMs are incredibly gullible by design, he said — they follow instructions, including injected ones. That is not a fixable bug; it is the feature. Any agentic system with real-world reach is carrying prompt injection risk today, and sandboxing is the only practical check on what happens when a model gets fooled.