@Gauri_the_great on Backlist

48.

Flipping the loop order in the attention kernel, iterating over KV blocks as the outer loop instead of queries m…

Flipping the loop order in the attention kernel, iterating over KV blocks as the outer loop instead of queries made it 4× faster than open-source sparse attention kernels Damn!!

by @Gauri_the_great (Gauri Tripathi) · backlist 2026-06-01 · rubric 91.0