@paddix on Backlist

64.

Serving LLMs is expensive because decoding is bound by memory bandwidth, not raw compute.

Serving LLMs is expensive because decoding is bound by memory bandwidth, not raw compute. KV caching solves this by storing each token's K/V tensors once and reusing them at every step, so you skip the quadratic recompute. Pairing it with p

by @paddix (Paddy Srinivasan) · backlist 2026-06-02 · rubric 71.0