@HuggingPapers on Backlist

19.

FastContext: a 4B codebase explorer for coding agents

Microsoft’s repo-exploration model offloads code search from the main coding agent, cutting tokens by up to 60% while improving SWE-bench scores

by @HuggingPapers (DailyPapers) · backlist 2026-06-15 · rubric 0.0

9.

FlashMemory: 90% smaller KV cache at 500K context

Lookahead Sparse Attention claims long-context memory compression without the usual accuracy collapse, attacking one of the main inference cost centers

by @HuggingPapers (DailyPapers) · backlist 2026-06-14 · rubric 52.0

23.

MiniMax MaxProof crosses the human gold-medal threshold

A test-time proof-search system reportedly scored 35/42 on IMO 2025 and 36/42 on USAMO 2026

by @HuggingPapers (DailyPapers) · backlist 2026-06-13 · rubric 42.0

12.

GrepSeek: search agents that grep raw text without embeddings or an index

A 9B open-weight model learns to write shell pipelines over a 14GB corpus and beats indexed retrieval baselines across several open-domain QA benchmarks

by @HuggingPapers (DailyPapers) · backlist 2026-06-01 · rubric 90.0

34.

LongTraceRL

LongTraceRL Teaches LLMs to reason through 128K contexts by learning from search agent trajectories and fine-grained entity-level rubric rewards.

by @HuggingPapers (DailyPapers) · backlist 2026-06-01 · rubric 93.0

57.

Generative supervision unlocks embodied intelligence

Generative supervision unlocks embodied intelligence Tencent Hunyuan and Tsinghua University release GEM, a VLM that learns physical grounding by predicting depth maps during pre-training, achieving state-of-the-art results on embodied ben

by @HuggingPapers (DailyPapers) · backlist 2026-05-31 · rubric 82.0

24.

ResearchMath-14K: 14k open research-level math problems

A multi-agent pipeline collected the largest open dataset of frontier research-level math problems for evaluating mathematical reasoning

by @HuggingPapers (DailyPapers) · backlist 2026-05-30 · rubric 68.0

64.

ResearchMath-14K: 14K open research-level math problems

ResearchMath-14K: 14K open research-level math problems Curated by agents from academic sources, with 220K reasoning traces. Fine-tuning filtered attempts improves Qwen3 by 9.2 points. Newer models also make 5x more fake references.

by @HuggingPapers (DailyPapers) · backlist 2026-05-30 · rubric 76.0