Someone has quietly shipped a browser-based AI assistant that handles retrieval-augmented generation entirely on the client. The tool, hosted at chatbot-ai-assistant.netlify.app, lets users ask questions, save arbitrary text datasets, and run semantic searches against them — with the embedding model loading and running in the browser itself.
The core trick is client-side embeddings, most likely powered by Transformers.js, the JavaScript port of Hugging Face's library. When a user saves a dataset, the app encodes it into vectors locally and searches that index by semantic similarity. The data never leaves the browser session. No vector database subscription, no retrieval API calls, no data hitting a remote server.
Generative responses work differently. Those route through an external API, with Netlify acting as a static front-end shell around the calls — a familiar split where the latency-tolerant, compute-heavy inference stays server-side while retrieval runs locally. A status indicator in the UI exposes the embedding model's load state, which is a small but honest acknowledgment that spinning up an on-device model isn't instant.
There's no company behind this, no documentation, and no marketing push. It reads as a personal project. But as a working demonstration of how much you can now assemble in a browser without standing up infrastructure, it's worth paying attention to. WebGPU, WASM, and libraries like Transformers.js have made running small models in the browser genuinely practical over the past two years, and this tool shows what that looks like when someone actually wires it into a usable product rather than a benchmark.