20.
Stop reporting 0.2% gains on saturated retrieval benchmarks
OBLIQ-Bench is proposed as a harder test for retrieval models and long-context LLMs when older benchmarks no longer distinguish real progress
1 appearance on the backlist front page in the last 30 days.
OBLIQ-Bench is proposed as a harder test for retrieval models and long-context LLMs when older benchmarks no longer distinguish real progress