Backlist — 12 May 2026 UTC

Selected one main Mini Shai-Hulud incident item plus one practical mitigation rather than several near-duplicate compromise reports; AI infrastructure is present but balanced with security, biology, energy, graphics, markets, hardware, and policy.

30.

Launching Agentick A unified benchmark for training and evaluating general sequential decision-making agents. RL agents, LLMs, VLMs, hybrids, bots, and humans can all be evaluated on: same tasks. same seeds. same score. First result: n

by (Roger Creus Castanyer) · backlist 2026-05-12 · rubric 96.0
31.

(x.com)

Modded-NanoGPT optimization result #13: @benjamintherien has achieved a new record of 3210 steps (-15), by wrapping NorMuonH in a MuLoCo-style outer Nesterov SGD. Compared to the target loss, this result has a p-value of p=1.3e-4. Compar

by (Keller Jordan) · backlist 2026-05-12 · rubric 96.0
38.

cool work :) if you've ever tried world models, you know how easily they break (e.g. stare into grass in minecraft and you'll easily fall OOD) using RL to find adversarial trajectories and then improve the world models is great - esp. if

by (Arnie Ramesh) · backlist 2026-05-12 · rubric 91.0
40.

Crabbox 0.12.0 is live Azure Windows desktop + WSL2 Proxmox + Tensorlake providers preflight, failure bundles, phase timing keep failed boxes around for SSH debugging Remote test boxes got much less slippery.

by (Peter Steinberger ) · backlist 2026-05-12 · rubric 91.0
42.

(x.com)

@augmentcode rebuilt their context compaction layer around Mercury 2. 82% latency cut. 90% cost cut. Comparable quality to Opus 4.7. Running in production today. "We took a counter-intuitive bet. We decoupled summarization entirely, offlo

by (Inception) · backlist 2026-05-12 · rubric 90.0
53.

(x.com)

1/ Following our previous MoE paper w/ @hayou_soufiane ( https:// arxiv.org/abs/2604.09780), we confirmed that scaling the residual stream: h^{\ell+1} = h^{\ell} + alpha \Delta^\ell improves MoE load balancing at initialization by reduci

by (Xidulu) · backlist 2026-05-12 · rubric 86.0
55.

The third semis memo is out We talk about power & analog semis, orchestration plane in the agentic era, the neoclouds trade, interconnect bottleneck (probably the biggest limiter for 2026-27), Korea Unlocked

by (Zephyr) · backlist 2026-05-12 · rubric 86.0
56.

(x.com)

Today: OpenMed Agent ships in preview. Built on @huggingface : → HF endpoints power clinical extraction + terminology → MCP for your own services → Every tool call, every plan, fully visible 1,000+ OpenMed medical models on HF. Preview

by (Maziyar PANAHI) · backlist 2026-05-12 · rubric 86.0
63.

(x.com)

Congrats to @andrew_li03 and the @JudgementLabs team on their fundraise! I vividly remember back in early 2025 when Andrew explained to me why agent monitoring and evaluation would be so crucial for any enterprise. It’s awesome to see t

by (Zeeshan Patel) · backlist 2026-05-12 · rubric 84.0
70.

1/ The "20 tokens per parameter" Chinchilla scaling law is flawed. It is an artifact of your tokenizer. Scaling shouldn't be measured in tokens at all. It should be measured in bytes.

by (Grigory Sapunov) · backlist 2026-05-12 · rubric 84.0