Charcuterie uses AI vision to spot Unicode's trickiest look-alikes

David Aerne built something oddly satisfying. Charcuterie is a visual Unicode explorer that lets you search for characters by what they look like, not by name or code point. Click on any glyph and you traverse a "similarity landscape" to find visually related characters across different scripts and symbol sets. The whole thing runs in your browser with no backend.

SigLIP 2, a vision model, embeds rendered glyphs into vector space. When you search, you're comparing visual similarity through those embeddings. Characters get treated as images rather than abstract code points.

The Cyrillic 'а' clusters near the Latin 'a'. They look identical but live in different scripts.

Homograph attacks exploit exactly this visual confusion. Attackers swap in lookalike characters from different scripts to create fake domains or wallet addresses. Current defenses rely on static lists from Unicode Technical Standard #39. But those lists miss obscure characters or new additions. A vision model approach catches similarity based on what users actually see on screen.

Running locally in the browser means no data leaves your machine. For security tools, that matters. You could build extensions that warn users about suspicious URLs without phoning home. Aerne is still developing the project and welcomes feedback. It's a creative application of vision models to a problem most of us don't think about until we get phished.