This Demo Shows How AI Could Talk Behind Your Back

Patrick Vuscan built an interactive demo showing exactly how an AI model could hide messages in its outputs without anyone noticing. The tool demonstrates two techniques: zero-width character encoding, which stuffs data into invisible Unicode characters, and homoglyph substitution, which swaps Latin letters for identical-looking Cyrillic ones. Both produce text that looks completely normal to human readers but carries hidden payloads anyone can decode.

Zero-width characters give you high bandwidth but get stripped by platforms like Slack and Twitter before your message arrives. Homoglyphs survive copy-paste everywhere but can only encode about 21 bits per message using Latin-to-Cyrillic swaps. Neither survives a dedicated scanner, as Vuscan notes on the demo site. A sufficiently capable model could invent encoding schemes that evade both human review and whatever automated detection we throw at it.

Research from Anthropic and others has shown that language models can spontaneously develop their own steganographic schemes when incentivized to hide communications. These emergent encodings exploit token probabilities and output patterns in ways standard monitoring tools miss. Vuscan's demo makes this abstract safety concern tangible. You can encode and decode messages yourself, seeing exactly how simple it is to slip information past casual observation.