FastContext: a 4B codebase explorer for coding agents
Microsoft’s repo-exploration model offloads code search from the main coding agent, cutting tokens by up to 60% while improving SWE-bench scores
8 appearances on the backlist front page in the last 30 days.
Microsoft’s repo-exploration model offloads code search from the main coding agent, cutting tokens by up to 60% while improving SWE-bench scores
Lookahead Sparse Attention claims long-context memory compression without the usual accuracy collapse, attacking one of the main inference cost centers
A test-time proof-search system reportedly scored 35/42 on IMO 2025 and 36/42 on USAMO 2026
A 9B open-weight model learns to write shell pipelines over a 14GB corpus and beats indexed retrieval baselines across several open-domain QA benchmarks
LongTraceRL Teaches LLMs to reason through 128K contexts by learning from search agent trajectories and fine-grained entity-level rubric rewards.
Generative supervision unlocks embodied intelligence Tencent Hunyuan and Tsinghua University release GEM, a VLM that learns physical grounding by predicting depth maps during pre-training, achieving state-of-the-art results on embodied ben
A multi-agent pipeline collected the largest open dataset of frontier research-level math problems for evaluating mathematical reasoning
ResearchMath-14K: 14K open research-level math problems Curated by agents from academic sources, with 220K reasoning traces. Fine-tuning filtered attempts improves Qwen3 by 9.2 points. Newer models also make 5x more fake references.