RunAnywhere Launches On-Device Voice AI for Mac Powered by Custom Metal GPU Engine

RunAnywhere shipped RCLI this week — an open-source voice AI tool for Mac that handles speech recognition, LLM inference, and text-to-speech entirely on-device. No API keys, no cloud connectivity. The Y Combinator W26 startup says end-to-end latency comes in under 200ms, and the tool ships with support for 38 macOS voice actions ranging from Spotify playback controls to triggering system screenshots.

The core of the project is MetalRT, a proprietary C++ GPU inference engine RunAnywhere built specifically for Apple Silicon's Metal architecture. On M3 chips and newer, the company claims MetalRT sustains 550 tokens per second during LLM inference and transcribes 70 seconds of audio in 101ms using a 4-bit Whisper Tiny model — a ratio they describe as 714x faster than real-time. In their own benchmarks on an M4 Max, MetalRT reportedly outpaces llama.cpp by 1.67x and Apple's MLX framework by 1.19x on decode speed. M1 and M2 machines fall back to llama.cpp; full MetalRT acceleration is reserved for the M3 generation and up. These numbers come from RunAnywhere's internal testing and haven't been independently verified.

The underlying pipeline, which RunAnywhere calls FastVoice, uses a multi-threaded C++ architecture with lock-free ring buffers, a 64MB pre-allocated memory pool, KV cache quantization, and async LLM-TTS pipelining. The company says first-audio latency — the time from end of speech to first audio output — sits at 63ms. RCLI also includes local RAG over documents, with hybrid retrieval clocking in at roughly 4ms, and supports more than 20 models.

RunAnywhere has been publishing technical write-ups on the architecture decisions behind MetalRT, suggesting the engine is aimed at developers building voice or agentic applications on Apple hardware rather than just end users.

The longer play is an on-device AI SDK targeting iOS, Android, and edge devices, with Swift, Kotlin Multiplatform, React Native, and Flutter bindings on the roadmap. RCLI is the first public demonstration of that stack. It's available now through Homebrew or a one-line curl install.