Backlist — 11 Jun 2026 UTC

Top 90 curated tweets ranked for substance on 11 Jun 2026 UTC.

35.

(x.com)

Fable 5 ( @AnthropicAI ) scores 22% and tops the Hedge-Bench leaderboard. Running Fable was roughly 2X more expensive than Opus 4.8 per trial. For an industry where accuracy is mission critical, human judgement isn't going away

by (Trata (YC W25)) · backlist 2026-06-11 · rubric 89.0
41.

New paper! People treat reasoning trajectories as text, but what if we can do better than that? We show that we can, by training Behavior Forecasters (BFs) that get a reasoning trajectory as input and make more accurate forecasts than front

by (Mosh Levy) · backlist 2026-06-11 · rubric 88.0
42.

What’s new in FrontierCS 2.0: 1. FrontierCS 1.0 algorithmic tasks are now agent-native, containerized, and Harbor-compatible. 2. We are releasing the private test cases for FrontierCS 1.0 algorithmic tasks. 3. Agents can receive controll

by (Qiuyang Mang) · backlist 2026-06-11 · rubric 88.0
50.

agent product smell test: 1. makes a slide = toy 2. fills a form = feature 3. checks the form against source docs = useful 4. sends the form, handles the rejection, updates the system = company half of “agentic” is just autocomplete weari

by (GEOFF) · backlist 2026-06-11 · rubric 88.0
52.

(x.com)

Another exciting AI-for-AI work from @Recursive_SI , improving the SOTA in nanogpt speedrun Track1 from 79.7s (previous SOTA: https:// x.com/classiclarryd/ status/2063061926092099868 …) to 77.34s ( https:// github.com/KellerJordan/m odded

by (Yiping Wang) · backlist 2026-06-11 · rubric 87.0
55.

(x.com)

. @nibzard built a deep research agent on Steel. Then the evals taught him it was good at the wrong thing: beautiful overviews, weak exact answers. The fix was not another tool. It was routing, durability, and reading the failures. ↓

by (Steel) · backlist 2026-06-11 · rubric 86.0
56.

(x.com)

vibe coding can only take you this far. we had a ghost bug in production at @TensorTonic serving 40k users for 5 months where pages would randomly break and the API would hang for exactly 30 seconds then throw a 500. it became routine

by (pdawg) · backlist 2026-06-11 · rubric 86.0
60.

(x.com)

Design GQA + top k indexer Scoring: SDPA + max pooling (Light house attn? @SubhoGhosh02 ) Training Dense warmup + KL loss to match index branch output to main branch attn output Stop gradient at index weight projection

by (Kimbo) · backlist 2026-06-11 · rubric 86.0
61.

The Field Learns to Sew Itself This animation uses a moving quadratic differential q(z,t)dz², where zeros and double poles steer thousands of particles along the field’s horizontal trajectories, turning the complex plane into a living fabr

by (Mathelirium) · backlist 2026-06-11 · rubric 86.0
64.

(x.com)

This quarter, @elise_ai crossed $200M in annual recurring revenue, our fifth straight year of doubling. Our first $100M took years, the next $100M took twelve months. When we started, a lot of people told us housing and healthcare were

by (Minna Song) · backlist 2026-06-11 · rubric 86.0
67.

FragCoord 1.2 -Pro Mode for publishing tutorials, commercial licenses and early access. -Compute shaders and HDR with WebGPU -Rebuilt debug modes: Tuner, Inspect, Speed -Market: for tutorials and commercial licensing

by (Xor) · backlist 2026-06-11 · rubric 86.0
77.

they walked it back 48h after throttling the feeds, HL already softened it from builder feedback: webData2 stays at 5s one more upgrade l2Book default drops to 2s new fastAssetCtxs endpoint keeps the old 5s mark price behavior infra is

by (CARSON.hl) · backlist 2026-06-11 · rubric 86.0
79.

Maybe first in rodents? Whole-body reprogramming for rejuvenation has still not convincingly worked in healthy mammals. Rejuvenating a cell or a tissue is one thing. Rejuvenating a whole body, safely, is a completely different problem.

by (P. E. Sottas) · backlist 2026-06-11 · rubric 86.0