A developer has published details of a custom voice-to-text tool built in a single Claude Code session, designed specifically for managing multiple AI coding agents running in parallel.
The tool has three speed modes. The fastest uses Nvidia's Parakeet v3 locally and returns transcriptions in under a second. A medium mode pairs Parakeet with GPT OSS 120B on Cerebras for LLM-based corrections — fast enough to feel near-instant. The slowest calls ElevenLabs Scribe V2 and Claude Opus 4.6, takes around 15 seconds, and targets complex, domain-specific prompts where accuracy matters more than speed.
The standout feature is tight integration with Zellij, a terminal workspace manager. The tool routes transcribed text to the correct tab or pane even after the user has switched focus elsewhere, and handles multiple transcriptions running concurrently — a practical necessity when several Claude Code sessions are active at once. Different keyboard shortcuts trigger different post-processing prompts, letting the developer shift between firing off quick commands and dictating detailed instructions without interrupting flow.
The audio capture layer is built on FluidAudio, the same library underlying many commercial voice-to-text apps. Costs are modest: ElevenLabs' basic plan runs around $5 per month, less than most commercial alternatives, and Cerebras inference keeps the medium mode fast without adding significant expense.
The developer had previously used SuperWhisper and VoiceInk, but accumulated friction with tools not designed around an agentic workflow eventually pushed them to build something purpose-fit. Notably, Claude Code was used to build the tool itself — the same system the tool is now optimised to support.