Modelwerk: Four Landmark Neural Networks Built in Pure Python to Teach AI From First Principles

Irish developer Bill de hÓra has released Modelwerk, an open-source educational project that implements four landmark neural network architectures entirely in pure Python — no NumPy, no PyTorch, no external libraries of any kind. Every operation, from matrix multiplication to softmax to backpropagation, is built from scalar floating-point arithmetic using only Python's standard library. The four completed lessons trace a deliberate historical arc: Rosenblatt's Perceptron (1958), the Rumelhart-Hinton-Williams backpropagation MLP (1986), LeCun's LeNet-5 (1998), and Vaswani et al.'s Transformer (2017). Each lesson is a standalone runnable script that trains a model and prints a narrative explanation of the architecture, mathematics, and results as it executes.

The project's philosophical core is anti-abstraction. De hÓra frames modern AI development through the metaphor of the Jacquard loom: where that invention made weaving legible by replacing opaque child labor with readable punch cards, he argues AI has moved in the opposite direction — practitioners increasingly fine-tune, prompt-engineer, and RAG-stack systems they cannot explain from first principles. The hard constraint of Python standard library only is not a performance choice — training the Transformer takes minutes where PyTorch would take milliseconds — but a deliberate mechanism to keep every computation traceable all the way down to a single floating-point multiply. The pedagogical sequencing is equally intentional: each architecture is chosen because it solves exactly one problem its predecessor could not, from XOR through spatial structure to global attention.

Modelwerk was built with Claude Code, which de hÓra describes as an "eyes-on, hands-off" agentic engineering workflow — he shaped decisions about architecture and mathematics while Claude handled the code. A fifth lesson is planned: Sakana AI's Continuous Thought Machines paper (Darlow et al., NeurIPS 2025 spotlight), which introduces neural synchronization and adaptive internal iteration — mechanisms with no precedent in any of the four architectures already covered. Llion Jones, one of the original co-authors of "Attention Is All You Need," is also a co-author on the CTM paper. Modelwerk's arc ends with the Transformer, then hands the baton to one of the Transformer's own architects working to supersede it.