83.
The Renaissance of Sparse Attention (old dilated like Longformer/Longnet, compressed like DeepSeek, query-aware l… (x.com)
The Renaissance of Sparse Attention (old dilated like Longformer/Longnet, compressed like DeepSeek, query-aware like MiniMax) vs. Hot linear attention/recurrence: Two separate lines of long-context scaling. We have a series of works with @