11.
DeepSeek sparse attention’s hidden fp32 tensor problem
DeepSeek’s compressed sparse attention can still materialize a massive fp32 indexing tensor before top-k selection, making long context memory behavior nontrivial
1 appearance on the backlist front page in the last 30 days.
DeepSeek’s compressed sparse attention can still materialize a massive fp32 indexing tensor before top-k selection, making long context memory behavior nontrivial