Backlist — 23 May 2026 UTC

Balanced toward durable technical and policy material while limiting agent/LLM items to cases with concrete artifacts, measurements, or consequences.

32.

// Adapt the Interface, Not the Model // I am fascinated by the results across my cheap-model-plus-good-harness builds. This new paper also shows good signs of the code-as-agent-harness thesis. The idea is really simple. Do not touch the

by (elvis) · backlist 2026-05-23 · rubric 92.0
35.

"Code as Agent Harness" Agents are becoming less like chatbots that write code and more like systems that run on code. This new Meta paper reframes code as the harness around an agent, the executable layer for reasoning, acting, memory, v

by (alphaXiv) · backlist 2026-05-23 · rubric 91.0
36.

Weekend project. This is my personalized Qwen Harness(that I made over the last 2 days) running locally on my Mac. 30 tokens/second Current SWE score of 74.67% on a smaller subset of problems. I will keep on improving the harness to sque

by (Saurabh Kumar) · backlist 2026-05-23 · rubric 89.0
38.

(t.co)

Congrats to the Webwright team https:// microsoft.github.io/Webwright at @MSFTResearch for taking the #1 spot on Odysseys, a highly challenging benchmark for long-horizon web agents: https:// odysseys-website.pages.dev/leaderboard Ody

by (Russ Salakhutdinov) · backlist 2026-05-23 · rubric 88.0
44.

nanobot × CLI-Anything nanobot now becomes your actual computer use coworker Instead of just talking about tasks, it can now directly operate the apps where real work happens - from 3D modeling and design tools to office workflows via C

by (Chao Huang) · backlist 2026-05-23 · rubric 88.0
59.

"Tokenisation via Convex Relaxations" Most LLM tokenizers still use BPE, a greedy merge algorithm that can waste vocab slots on locally good but globally suboptimal tokens. This paper turns tokenizer training into a linear program, then r

by (alphaXiv) · backlist 2026-05-23 · rubric 82.0
65.

(x.com)

I adopted @steipete 's coding workflow last year. You just have to just talk to your agents. So it's super important to know when and where an agent wants to talk to you! This is what I built cmux around. When you have a lot of codexes/c

by (Lawrence Chen) · backlist 2026-05-23 · rubric 81.0
89.

Thought my GPU was cooked, nope. Turns out a random Discord update turned on Clips by default. Turning it off made me go from 100% usage to 7%. Go turn that shit off and save yourself the headache. Hope it helps

by (Venalis) · backlist 2026-05-23 · rubric 78.0
90.

Some polymarket builder ideas: -trading spreads of contracts directly: yes June 30th and no dec 30, without the double spread -RFQ for larger bundles, like a mm offloading their positions. -OTC trading in a low-trust way

by (Cajetan) · backlist 2026-05-23 · rubric 76.0