23.
TurboQuant+ shrinks KV cache memory 4.75x (x.com)
TurboQuant+ shrinks KV cache memory 4.75x with 3-bit quantization across CUDA and Metal while preserving near-fp8 top-5 behavior
1 appearance on the backlist front page in the last 30 days.
TurboQuant+ shrinks KV cache memory 4.75x with 3-bit quantization across CUDA and Metal while preserving near-fp8 top-5 behavior