FP4: When Your Number Format Has Only 16 Values

FP4 can only represent 16 distinct values. Neural networks use it anyway. John Cook's walkthrough of 4-bit floating point lays out exactly what happens when you compress a number into four bits. The dominant format, called E2M1, gives you one sign bit, two exponent bits, and one mantissa bit. That gets you a range from -6 to +6. And yes, it still has both positive and negative zero, because even at four bits, floating point insists on being weird.

The whole point is memory. Cutting each parameter from 16 bits to 4 bits means fitting four times as many into the same VRAM. The tradeoff is brutal precision loss, but empirically, models still work.

Nvidia's Hopper and Blackwell architectures support E2M1 natively. PyTorch added native float4 types in version 2.1. AMD's ROCm stack software platform has been expanding similar low-precision support on its Instinct MI300 hardware. The Open Compute Project is working on interoperable specs to keep these formats from becoming vendor traps.

Cook provides a full value table and working Python code using the Pychop library to emulate FP4 arithmetic. If you're building tooling around quantized models or just want to understand what your hardware is actually doing, that's your practical starting point.