11.
Latent Context Language Models make long-context inference up to 8.8× faster
A 0.6B encoder compresses long context into latent vectors for a 4B decoder, reducing long-context cost while preserving accuracy
1 appearance on the backlist front page in the last 30 days.
A 0.6B encoder compresses long context into latent vectors for a 4B decoder, reducing long-context cost while preserving accuracy