36.
We patched LFM2.5-350M (pre-trained on 28T tokens) to transform a causal decoder into a bidirectional encoder.
We patched LFM2.5-350M (pre-trained on 28T tokens) to transform a causal decoder into a bidirectional encoder. It worked extremely well: both Embedding and ColBERT models get best-in-class performance, especially for multi/cross-lingual ta