CNN Explainer is a browser-based tool that lets you watch a convolutional neural network process an image, layer by layer, in real time. Built by Georgia Tech's Polo Club of Data Science and presented at IEEE VIS 2020, it runs a small CNN trained on Tiny ImageNet and renders every transformation as it happens — convolution filters scanning across pixel data, ReLU activations zeroing out negatives, pooling layers compressing spatial information into tighter feature maps.

The whole thing runs in JavaScript and D3.js. Select or upload an image, and the visualization updates with each forward pass. Hover over any layer and you can see which input features triggered it. It takes the standard textbook treatment of CNNs — dense with notation, light on intuition — and turns it into something you can actually poke at.

CNNs aren't the story of the moment. Transformers get the research coverage, and most agent infrastructure runs on language models. But the vision encoders inside GPT-4o, Gemini, and Claude — the parts that process image inputs before anything gets passed to the language model — still rely on convolutional or patch-based designs that trace back to CNN fundamentals. For anyone building multimodal pipelines, what happens before the image embedding matters.

CNN Explainer resurfaced on Hacker News this week and drew a fast-moving thread. The tool is six years old and hasn't changed much. That it still generates this kind of engagement says something about how thin the supply of genuinely clear explanatory tools remains, even as the systems built on top of these architectures keep getting more capable.