Zero Hallucinations, 10x Context Window: Hume AI Open-Sources Its Fastest TTS Model

When Hume AI's nine-person research team ran TADA through 1,000-plus samples from the LibriTTS-R benchmark, the hallucination count came back at zero. Not near-zero — zero. For anyone who has shipped production voice agents and spent time patching around phantom insertions and dropped words, that number alone is worth paying attention to.

TADA (Text-Acoustic Dual Alignment) is Hume's newly open-sourced LLM-based text-to-speech architecture, and its design is almost deliberately simple: one continuous acoustic vector per text token, enforced throughout the autoregressive loop. That structural constraint is precisely what kills hallucinations. The model cannot physically skip or insert content, so reliability becomes a property of the architecture rather than something bolted on after training.

The performance implications spread outward from there. Operating at just 2–3 frames per second of audio rather than the 12.5–75 frames typical of competing LLM-based TTS systems, TADA hits a real-time factor of 0.09 — more than five times faster than comparable approaches. The same token efficiency makes it unusually frugal with context. A conventional interleaved TTS setup exhausts a 2,048-token window in roughly 73 seconds of audio, because text tokens (~3 per second) and speech tokens (~25 per second) stack on top of each other; TADA, aligned only to text-rate tokens, stretches that same budget to around 680 seconds. Voice agents that need to hold state across long conversations — or narrate anything longer than a minute without resetting — have a real problem with current systems that TADA directly addresses. "We designed TADA to accelerate progress toward efficient, reliable voice generation," the Hume team wrote in the release announcement.

On voice quality, the model holds up reasonably well against systems trained on far more data. In human evaluations on the expressive EARS dataset, TADA scored 4.18/5.0 for speaker similarity and 3.78/5.0 for naturalness, finishing second overall. Combined with a footprint light enough for on-device deployment, that positions TADA as a credible option for edge and mobile voice interfaces where cloud round-trips are a liability.

Hume is releasing two models under the MIT license: TADA-1B, an English model built on Llama 3.2 1B, and TADA-3B-ML, a multilingual variant covering 10 languages built on Llama 3.2 3B. Both are on Hugging Face, installable via the hume-tada PyPI package, with a live demo on Hugging Face Spaces. The research paper (arXiv:2602.23068) is authored by Trung Dang, Sharath Rao, and seven colleagues including Alan Cowen.

The model has documented weak spots. Generations longer than 10 minutes can show speaker drift, and quality degrades when the model is asked to generate text and speech simultaneously — a limitation the team is addressing with a technique called Speech Free Guidance. Whether those constraints matter depends heavily on the use case, but for anything under 10 minutes they do not appear to be blockers.