You can now run a 3.1GB language model in your browser. No server. No API key. No waiting. A developer named Chong built a demo that runs Google's Gemma 4 model using WebGPU to generate Excalidraw diagrams from text prompts. Type something like "OAuth 2.0 authorization code flow as a sequence diagram" and it draws it, locally, at over 30 tokens per second. The trick is compression. Browser GPUs don't have much memory. Chong's TurboQuant algorithm implements polar decomposition and QJL rotation in WGSL compute shaders, compressing the model's KV cache by about 2.4x. That's what makes longer conversations possible without blowing past your GPU memory limit. A second optimization helps too. Raw Excalidraw JSON needs around 5,000 tokens. The model generates compact code instead, averaging about 50 tokens. Less output, faster generation, less memory pressure. The catch: desktop Chrome 134+, WebGPU support, and roughly 3GB of free RAM. Safari and iOS are out because they lack the required WebGPU subgroup features. Mobile browsers cap memory well below what this needs. Chong released the TurboQuant algorithm as the turboquant-wasm npm package. Running a 3 billion parameter model at 30 tokens per second in a browser tab would have sounded absurd two years ago. Now the question is what else fits.