36.
My longest blog post to date, and it is dense!
My longest blog post to date, and it is dense! Long-context LLMs make KV cache memory the bottleneck: every cached token carries K/V tensors at every layer. I wrote a survey + code-first guide to KV cache compression: Attention Sink, L2,