@che_shr_cat on Backlist

5 appearances on the backlist front page in the last 30 days.

15.

1/ We have spent years optimizing KV cache via head-sharing (GQA/MQA), but we ignored a fundamental assumption: why do Transformers need three separate Q, K, and V projections in the first place? Turns out, they don't. Merging them unlocks

by (Grigory Sapunov) · backlist 2026-06-09 · rubric 82.0