@lateinteraction on Backlist

20.

Stop reporting 0.2% gains on saturated retrieval benchmarks

OBLIQ-Bench is proposed as a harder test for retrieval models and long-context LLMs when older benchmarks no longer distinguish real progress

by @lateinteraction (Omar Khattab) · backlist 2026-06-01 · rubric 84.0