NexusQuant vs KVTC vs TurboQuant vs CommVQ — honest comparison
There are now enough KV cache compression papers that "we beat the competition" is meaningless without specifics. Which competition? On which data? At which compression ratio? With or without calib...

Source: DEV Community
There are now enough KV cache compression papers that "we beat the competition" is meaningless without specifics. Which competition? On which data? At which compression ratio? With or without calibration? This post is an honest head-to-head. For each competitor: what they do, their reported numbers from their papers, our numbers, where we win, and where they win. The comparison table Method Compression Quality Training-free When it wins NexusQuant (short ctx) 10x +0.14 to +0.90% Yes Training-free above 6x NexusQuant (long ctx) 16.6x +0.82% Yes Best training-free at 16x KVTC (NVIDIA) up to 20x < 1pp No (10 min cal) Highest compression TurboQuant (Google) ~5-6x ~0% Yes Best quality at low compression CommVQ (Apple) ~8x ~0% No (training) Best quality at 8x Palu 11.4x ~+1.19% No (calibration) Low-rank if you have calibration data None of these numbers are from our experiments. KVTC, TurboQuant, CommVQ, and Palu numbers are from their papers. We have not run their code on our data. That