64.
Serving LLMs is expensive because decoding is bound by memory bandwidth, not raw compute.
Serving LLMs is expensive because decoding is bound by memory bandwidth, not raw compute. KV caching solves this by storing each token's K/V tensors once and reusing them at every step, so you skip the quadratic recompute. Pairing it with p