Backlist — 27 May 2026 UTC

Skipped many near-duplicate AI agent launches to keep security, biology, hardware, markets, web engineering, and weird artifacts represented.

35.

The unlock was a self-improvement loop. We record production misses: unsupported fields, wrong predictions, and corrections. Codex then uses that context to autonomously create evals from production data, hillclimb against them, and open

by (Samay) · backlist 2026-05-27 · rubric 94.0
44.

MiniMax-M2 paper just dropped The key focus of M2 is on something more agent-native. It trains on runnable workspaces and artifact-grounded rewards, then uses Forge to scale RL over long coding, app, search, and office-task trajectories.

by (alphaXiv) · backlist 2026-05-27 · rubric 92.0
50.

SGLang v0.5.12.post1 is live This is a stability patch on top of v0.5.12, with 12 cherry-picks focused on DeepSeek V4, NIXL PD disaggregation, and Blackwell. DeepSeek V4 Fixed V4-Pro garbled text on single-token decode (B200/B300) Fixed

by (LMSYS Org) · backlist 2026-05-27 · rubric 91.0
52.

(x.com)

shipped @getbuzzr /dfs-engine 4.0 today — strict settlement contracts, typed invariant errors, hardened payout math for PrizePicks/Underdog-style DFS grading. 4 new companions also at 1.0: dfs-cli, dfs-react, dfs-provider-sportradar, dfs

by · backlist 2026-05-27 · rubric 91.0
54.

Editor’s note: imported_from_x_likes

we just shipped sandbox-sdk v0.10.2 today - cloudflare tunnels support - mount R2 buckets directly from worker bindings - isolated exec() calls small release, but a lot of quality of life improvements for agents running in containers

by (kate) · backlist 2026-05-27 · rubric 90.0
57.

ECHO paper + code are now live! We open-sourced a small SkyRL-based implementation of "world loss" for terminal-agent RL. GRPO trains on what the agent did. ECHO also learns from what the terminal said next. Same rollout. Same policy fo

by (Vaish Shrivastava) · backlist 2026-05-27 · rubric 88.0
58.

(t.co)

Gandalf code: https:// github.com/Handshake-AI-R esearch/gandalf-the-grader … Blog post with details: https:// joinhandshake.com/research/ai/ga ndalf-the-grader/ …

by (Anish Athalye) · backlist 2026-05-27 · rubric 88.0
59.

New: grep for exact matching grep → keyword / regex matching search → fine-grained semantic retrieval Works across uploaded content, including text, PDFs (OCR) and audio/video (transcription). Give your agents both retrieval primitives t

by (Mixedbread) · backlist 2026-05-27 · rubric 88.0
68.

(x.com)

Per @EpochAIResearch , the world's Blackwell GPUs can produce roughly 500M–20B output tokens per second today, depending on context length. Inference capacity is growing 3.4x/year. Token demand is growing 10x/year. Long-context workloads

by (Shanu Mathew) · backlist 2026-05-27 · rubric 88.0
81.

(x.com)

Today, @MichaelElabd , @QuantumArjun , and I are excited to announce Trajectory. We are a research lab and product company building the platform for Continual Learning. Our platform unlocks the signal already sitting in product usage,

by (Ronak Malde) · backlist 2026-05-27 · rubric 86.0
82.

AI attackers have terrible OPSEC. Use it against them. Hallucinate exposed services. Waste their tokens. Seed prompt-injection traps, canaries, and honeytokens where attacker LLM will read them. Have fun.

by (Juliano Rizzo) · backlist 2026-05-27 · rubric 86.0
85.

TyphoonPWN 2026 Unpwned Found a bug in the "ipTime Router WAN PreAuth Remote Code Execution" category ($10,000) using an LLM and reported it in February for TyphoonPWN 2026. Unfortunately, it was patched in March before the event. #TyphoonC

by (Satoki@Kn0wl3dg3) · backlist 2026-05-27 · rubric 86.0