12.
Latent Context Language Models compress context 16×
LCLMs compress token context into latent vectors and claim a better latency–accuracy frontier for long-context inference
2 appearances on the backlist front page in the last 30 days.
LCLMs compress token context into latent vectors and claim a better latency–accuracy frontier for long-context inference
We trained language models that compress massive contexts into tiny latent representations. Latent Context Language Models (LCLMs) outperform existing KV cache compression methods on the latency/accuracy frontier. 1/10