33.
Why KV cache is one of the main reasons LLMs are fast?
Why KV cache is one of the main reasons LLMs are fast? KV cache is what connects attention mechanism with generation stage of autoregressive models. These models generate text token by token, but each new token still attends to all previou