RunAnywhere Launches RCLI: On-Device Voice AI with Proprietary MetalRT Inference Engine for Apple Silicon

RunAnywhere (YC W26) has released RCLI, an open-source command-line tool that runs a complete voice AI pipeline — speech-to-text, LLM inference, and text-to-speech — entirely on Apple Silicon with no cloud dependency. The tool achieves sub-200ms end-to-end latency and supports more than 20 local models including Qwen3, LFM2, Whisper, and Kokoro, along with local document retrieval via hybrid RAG at approximately 4ms query latency. RCLI is installable via a one-line curl script or Homebrew and includes an interactive TUI for model management, push-to-talk voice input, and a browser for 38 macOS actions controllable by voice through AppleScript and shell commands.

The core technology differentiating RCLI is MetalRT, a proprietary C++ GPU inference engine RunAnywhere built specifically for Apple Silicon's Metal 3.1 API. On an Apple M4 Max with 64GB unified memory, the company reports MetalRT achieves 658 tok/s peak LLM decode on Qwen3-0.6B — 1.67x faster than llama.cpp and 1.19x faster than Apple's own MLX framework — while transcribing 70 seconds of audio in 101ms (4.6x faster than mlx-whisper) and synthesizing short TTS phrases in 178ms (2.8x faster than mlx-audio). MetalRT requires M3 or later chips to unlock full acceleration; M1 and M2 Macs fall back automatically to llama.cpp. RunAnywhere claims MetalRT is currently the only inference engine to support all three modalities — LLM, STT, and TTS — under a single unified Metal-native engine, unlike Apple's MLX ecosystem, where speech capabilities remain split across separate community-maintained packages.

The underlying voice pipeline, which RunAnywhere documents in a blog post called FastVoice, achieves 63ms first-audio latency through a combination of KV cache quantization (FP16 to Q8_0, reducing memory pressure by 47%, per the company), word-level streaming TTS flushing at seven words, system prompt KV caching, and a fully asynchronous LLM-TTS pipeline using lock-free ring buffers and a pre-allocated 64MB memory pool. The architecture runs VAD, STT, and TTS on concurrent OS threads, with double-buffered sentence synthesis allowing the next sentence to render while the current one plays back.

Community reception on Hacker News was cautiously positive. One commenter, posting as throwaway_mleng, described RCLI as "the most technically complete local voice stack I've seen on Mac" while flagging that Homebrew installation threw checksum mismatches on a fresh M3 Pro. Others questioned action grounding fidelity — the tool reportedly confirmed completing actions it had not actually executed — and raised uncertainty about the overall product positioning. Feature requests included broader Hugging Face model selection and Unsloth quantization support. The sharpest strategic risk is one RunAnywhere cannot engineer around: Apple controls both the Metal API roadmap and MLX development, meaning the performance gap the company currently holds over Apple's own framework could narrow with any future framework or hardware update. RunAnywhere's YC W26 backing suggests investors are betting the optimization lead will persist long enough to build developer distribution before that window closes.