Backlist — 12 Jun 2026 UTC

Balanced the very strong AI/model news against security, policy, biology, graphics, robotics, markets, and durable builder artifacts to avoid an agent-only slate

13.

River’s largest revision adds 60% faster throughput, 20x faster backlog draining, arbitrary out-of-band signal waits, timers, and CEL wait expressions

by (Brandur) · backlist 2026-06-12 · rubric 89.0
31.

(x.com)

Result #32: @mihai673 has achieved a 30-step improvement over the old 2026/05/09 record by adding a SODA (Pethick et al. 2026)-style anchor towards init. It is unknown whether this technique can also improve the current record. 2/5

by (Keller Jordan) · backlist 2026-06-12 · rubric 91.0
34.

Tokenminning: Token⋅Min⋅ning Get the *same quality* work done in the *same time* as your tokenmaxxing peers but with the LEAST amount of tokens Tokenmaxxing is too easy to hack (just run things in loop, in parallel, etc.) What are some g

by (Robert Yang) · backlist 2026-06-12 · rubric 89.0
35.

(x.com)

I confess: you can't dynamically resize a @modal sandbox. Because you don't have to Sandbox workloads are spiky: install, wait, spike, wait We built our runtime to be *burstable*. Request the min & burst above it when your workload s

by (Adam Azzam) · backlist 2026-06-12 · rubric 89.0
45.

(x.com)

Two insights from LeapAlign: 1. Gradient descent, rather than GRPO, is native to diffusion post-training. 2. Early generation steps should be trained, such that image layout can be better optimized. Thanks @hillbig for posting this work.

by (Liang Zheng) · backlist 2026-06-12 · rubric 88.0
49.

(t.co)

Big updates for InferenceBench v1.0.1! Some highlights: - 10 more entries to the leaderboard, including Fable 5, Opus 4.8, Kimi 2.6, and Gemini 3.5 Flash - Re-scoring / Re-evaluation of select models See the changes for yourself at: htt

by (Jehyeok Yeon @ ICML 2026 ) · backlist 2026-06-12 · rubric 86.0
51.

Claude 5 Fable (Ultracode) "Make a playable alpine glacial valley at sunrise" No meshes or models. Everything you see is math. Fable screenshotted its own work and iterated. Took ~30 mins, ~500k tokens, ~2500 lines of code, and ~$25. Ext

by (Deedy) · backlist 2026-06-12 · rubric 86.0
57.

(x.com)

Context Arena: Added @AnthropicAI 's Claude Opus 4.8 on 8-needle GDM-MRCRv2. Thanks @OpenRouter for the credits to run Opus 4.8 @ max. All results at: https:// contextarena.ai Opus 4.8 (max reasoning) lands #2 on AUC@128k, behind only

by (Dillon Uzar) · backlist 2026-06-12 · rubric 86.0
69.

startup data point: founder shows me a customer’s support queue at 9:18pm. 43 tickets. 12 refund edge cases. 3 policy exceptions. 1 angry enterprise account. then asks: “which of these still needs a human?” that is a better pitch than 9

by (GEOFF) · backlist 2026-06-12 · rubric 84.0
74.

I tried this so you don’t have to. At the end, I got: - 10,000 impressions - $600 spent - 0-400 clicks (tracking isn’t very good) - 0 conversions I’m probably not going to spend more on this platform at the current stage because it’s a ve

by (Ansh Nanda) · backlist 2026-06-12 · rubric 83.0
84.

prediction: agents will expose a funny lie in enterprise software. half the product surface was not there because users loved it. it was there because humans needed reminders, approvals, queues, status pages, nudges, and meetings to move

by (GEOFF) · backlist 2026-06-12 · rubric 82.0
88.

Mat got new Update, update your app Basic fixes + A Big one: collective mats! Invite up to 5 friends over iMessage and decorate one mat together, with live sync and widget updates. Plus: - cutout-letter alphabet everywhere - type words i

by (Rahul Bhadoriya) · backlist 2026-06-12 · rubric 82.0