62.
In v0.21.0, the KV Offload + Hybrid Memory Allocator (HMA) feature was added. Even for models with hybrid attenti…
In v0.21.0, the KV Offload + Hybrid Memory Allocator (HMA) feature was added. Even for models with hybrid attention, you can now offload the KV cache to regular memory, so this is definitely something you should enable. --kv-offloading-size