Trilobyte Lets Language Models Compress 24-bit Audio Losslessly

Researchers from UC San Diego and Carnegie Mellon University have published a paper introducing Trilobyte, a byte-level tokenization scheme that allows autoregressive language models to perform lossless audio compression at full professional fidelity. The work, authored by Phillip Long, Zachary Novack, and Chris Donahue and submitted to Interspeech 2026, addresses a longstanding limitation: prior research demonstrating that language models could outperform the FLAC codec was confined to 8-bit audio, leaving open the question of whether such approaches could scale to the 16-bit and 24-bit depths used in professional recording and distribution.

The core problem Trilobyte solves is vocabulary explosion. Standard sample-level tokenization requires a vocabulary of 65,536 tokens for 16-bit audio and over 16.7 million for 24-bit, making direct language model-based compression computationally intractable at higher bit depths. By decomposing audio samples into their constituent bytes instead, Trilobyte reduces vocabulary scaling from O(2^b) to O(1) in bit depth, enabling the first tractable 24-bit LM-based lossless compression. The paper benchmarks this approach across music, speech, and bioacoustics at sampling rates from 16kHz to 48kHz, finding that bit depth is the primary limiting factor for performance — not audio domain or sampling rate.

The results show a clear pattern of diminishing returns as bit depth increases. At 8-bit, the paper reports language models outperforming FLAC by an average of 217 percent, consistent with earlier findings from WaveNet-era research. That advantage narrows to roughly 18 percent at 16-bit and becomes more modest still at 24-bit, suggesting that while LM-based compression holds an edge at CD quality, the gains are less dramatic than initial 8-bit benchmarks implied. The study used decoder-only Transformer architectures with arithmetic coding, evaluating both pre-trained large language models and models trained from scratch on audio data.

Professional audio workflows start at 16-bit minimum. Most prior academic work on LM-based compression never reached that threshold, which kept it disconnected from real production use. With Trilobyte's code now public, audio engineers and researchers have a working baseline for neural lossless codecs that operates at the bit depths actually used in studios and streaming pipelines — a gap the field has been unable to close until now.