@JPSV_calif on Backlist

62.

In v0.21.0, the KV Offload + Hybrid Memory Allocator (HMA) feature was added. Even for models with hybrid attenti…

In v0.21.0, the KV Offload + Hybrid Memory Allocator (HMA) feature was added. Even for models with hybrid attention, you can now offload the KV cache to regular memory, so this is definitely something you should enable. --kv-offloading-size

by @JPSV_calif (JP) · backlist 2026-06-04 · rubric 78.0