Backlist — 13 May 2026 UTC

1.

Autonomous bug hunter finds 18-year NGINX heap overflow (x.com)

A newly disclosed NGINX CVE spans versions 0.6.27 through 1.30.0 and affects rewrite plus set configurations

by @Markak_ (Zhenpeng (Leo) Lin) · backlist 2026-05-13 · rubric 94.0

2.

Token Superposition Training claims 2–3× faster LLM pretraining

The method reports a 2–3× wall-clock pretraining speedup at matched FLOPs without changing the model architecture, optimizer, tokenizer, or data

by @NousResearch (Nous Research) · backlist 2026-05-13 · rubric 92.0

3.

Computer-use agents break under ordinary UI drift

Software updates, OS migrations, UI changes, and resolution shifts can substantially reduce agent performance on the same tasks

by @xue_tianci (Tianci Xue) · backlist 2026-05-13 · rubric 91.0

4.

Flow model benchmarks may be overstating progress

A reported 3x perplexity gain since 2023 shrinks to about 1.1x after controlling for sample entropy

by @Sam_Acqua (Sam Acquaviva) · backlist 2026-05-13 · rubric 86.0

5.

Varda crystallizes an HIV drug in orbit (t.co)

Microgravity manufacturing is moving from space-tech demo to pharma formulation partnerships with commercial disease targets

by @BloombergTV (Bloomberg TV) · backlist 2026-05-13 · rubric 44.0

6.

Microsoft’s multi-agent security system found 16 Patch Tuesday bugs

More than 100 specialized agents across frontier and custom models topped CyberGym and were used before Patch Tuesday

by @satyanadella (Satya Nadella) · backlist 2026-05-13 · rubric 86.0

7.

Excel auto-formatting corrupted nearly a third of genetics papers

Excel’s automatic conversions have mangled gene names such as the SEPT family in a large share of genetics literature

by @lauriewired (LaurieWired) · backlist 2026-05-13 · rubric 24.0

8.

SWE-ZERO-12M: 12M open agent trajectories

The dataset contains 112B tokens across 122K PRs and 3K repos, making software-agent behavior available at unusual scale

by @kevin_x_li (Kevin Li) · backlist 2026-05-13 · rubric 92.0

9.

Supabase warns of npm typosquatting package

A lookalike package using visually similar characters is targeting developers who increasingly install dependencies via AI-generated commands

by @supabase (Supabase) · backlist 2026-05-13 · rubric 76.0

10.

A flatworm transposon improves regeneration (x.com)

A Cell paper finds a transposon can protect stem cells from stress-induced death, complicating the usual junk DNA and inflammation framing

by @davidasinclair (David Sinclair) · backlist 2026-05-13 · rubric 20.0

11.

Robotics simulation infrastructure, from poses up

Better pose management and simulation plumbing are becoming first-order bottlenecks for physical AI systems

by @Stone_Tao (Stone Tao) · backlist 2026-05-13 · rubric 88.0

12.

Vib-Ribbon’s single-stroke font recovered from PlayStation ROMs (x.com)

The original 1999 font data was forensically extracted and added to an archive for p5.js creative coding

by @golan (Golan Levin) · backlist 2026-05-13 · rubric 18.0

13.

Daytona describes sandboxes built for agentic scale (x.com)

Sub-60ms spinups, 50k creations per minute, persistent state, and point-in-time memory make the sandbox a distinct primitive from a VM

by @ivanburazin (Ivan Burazin) · backlist 2026-05-13 · rubric 90.0

14.

The Lake Tahoe data-center power story is not what it sounds like

The viral claim about residents losing power to data centers appears to confuse a utility supply contract ending with a physical power shortage

by @AndyMasley (Andy Masley) · backlist 2026-05-13 · rubric 38.0

15.

Kaon: Muon without the singular-value geometry (t.co)

Replacing singular values with random noise reportedly matches Muon, suggesting the optimizer gain may come from stable step sizing rather than geometry

by @f14bertolotti (Francesco Bertolotti) · backlist 2026-05-13 · rubric 84.0

16.

Serverless GPUs without full image loads on cold start

Loading container images asynchronously and lazily attacks one of the hidden startup costs for AI inference

by @charles_irl (Charles Frye) · backlist 2026-05-13 · rubric 88.0

17.

Cars24 walked away from Jira weeks after renewing (t.co)

The case study shows that workflow software displacement can happen despite sunk multi-year contracts when users perceive enough operational drag

by @linear (Linear) · backlist 2026-05-13 · rubric 74.0

18.

Anthropic reportedly nearing $50B run-rate; Ramp users flip from OpenAI (x.com)

Investor documents and Ramp data point to a business-adoption split that differs sharply from consumer mindshare

by @Techmeme · backlist 2026-05-13 · rubric 74.0

19.

A review of non-animal methods in drug development (x.com)

Cellular models, microphysiological systems, and computational methods are being framed as part of a broader move toward human-centric biomedical research

by @SAOscience (Science Advancement and Outreach) · backlist 2026-05-13 · rubric 14.0

20.

Why Rust could displace Go in cloud infrastructure

AI assistance changes the tradeoffs around systems-language complexity, making richer safety guarantees more attractive for infrastructure code

by @alongubkin (Alon Gubkin) · backlist 2026-05-13 · rubric 62.0

21.

The hidden carry problem in SPV angel investing

Deal-by-deal SPV carry charges winners independently of losers, creating different incentives than blended fund economics

by @Nick_Davidov (Nick Davidov) · backlist 2026-05-13 · rubric 32.0

22.

Who makes money in stablecoins? (x.com)

A stablecoin investor lays out the business-model questions behind 300B dollars of supply and 33T dollars of volume

by @jonah_b (Jonah Burian) · backlist 2026-05-13 · rubric 54.0

23.

China added 543 GW of energy capacity last year

The comparison that 434 GW was renewable versus 53 GW of total new US energy capacity reframes debates about industrial scaling

by @atrupar (Aaron Rupar) · backlist 2026-05-13 · rubric 18.0

24.

scrcpy 4.0 released (t.co)

The Android screen-mirroring tool adds flex display support and remains one of the most useful small utilities in the mobile developer toolbox

by @scrcpy_app (scrcpy) · backlist 2026-05-13 · rubric 84.0

25.

The 15-year macOS audio-balance bug

Heavy CPU load can apparently pan audio balance left or right, turning a long-misdiagnosed annoyance into a durable OS bug report

by @nicbarkeragain (Nic Barker) · backlist 2026-05-13 · rubric 38.0

26.

Deal decks are starting to include instructions for AI analysts

A GP deck now tells AI tools which facts are load-bearing and how to analyze the deal, signaling that investment materials are being written for model readers

by @credistick (Dan Gray) · backlist 2026-05-13 · rubric 22.0

27.

Simple things that make shaders look cheap

Color choices like pure RGB and harsh linear gradients are small shader defaults that visibly reduce perceived quality

by @XorDev (Xor) · backlist 2026-05-13 · rubric 68.0

28.

Prusa alleges BambuStudio has violated the PrusaSlicer AGPL

The complaint centers on a fork, a networking binary black box, and the unresolved tension between open-source slicers and closed hardware ecosystems

by @josefprusa (Josef Prusa) · backlist 2026-05-13 · rubric 74.0

29.

Mind Robotics raises $400M for full-stack manufacturing robots (x.com)

The Rivian spinoff pairs hardware expertise, a live manufacturing environment, and an at-scale first customer to attack industrial robotics deployment

by @Redpoint · backlist 2026-05-13 · rubric 20.0

30.

The case for memory-mapping Uint8Arrays into WebAssembly memory

Avoiding repeated memcpys between JavaScript typed arrays and WASM memory remains a low-level performance wish for browser compute

by @zebassembly (zeb) · backlist 2026-05-13 · rubric 74.0

31.

i've been using pi + playwright + chrome, to close the loop for my agents on webapps. Is this still where it's…

i've been using pi + playwright + chrome, to close the loop for my agents on webapps. Is this still where it's at? are people using that chrome extension thing codex uses for computer use?

by @lucasmeijer (Lucas Meijer) · backlist 2026-05-13 · rubric 92.0

32.

Apply here: https:// luma.com/poolsidehackat hon … Come work directly on Laguna XS.2: → fine-tuning → post-training → quantization → RL environments → inference optimization → stronger agentic coding workflows @PrimeIntellect Lab

by @poolsideai (poolside) · backlist 2026-05-13 · rubric 92.0

33.

We are getting ready to do some very large runs on MirrorCode to learn whether AI can solve coding tasks that wou…

We are getting ready to do some very large runs on MirrorCode to learn whether AI can solve coding tasks that would take months for an engineer to complete. The version of the experiment we would most like to do is very expensive: it would

by @Jsevillamol (Jaime Sevilla) · backlist 2026-05-13 · rubric 92.0

34.

cmux now has a task manager so you can see how much CPU/RAM your coding agents are eating.

cmux now has a task manager so you can see how much CPU/RAM your coding agents are eating. `cmux top` or `Cmd+Shift+P` -> Task Manager v0.64.4+

by @lawrencecchen (Lawrence Chen) · backlist 2026-05-13 · rubric 92.0

35.

Great example of why you should

Great example of why you should 1. Run your agent on a separate machine from the sandbox it uses (e.g. sandbox as a tool) 2. Never set env vars in your sandbox. Instead, use something like LangSmith’s sandbox proxy auth (reqs are intercepte

by @BraceSproul (Brace) · backlist 2026-05-13 · rubric 92.0

36.

Active Teacher Selection for Reward Learning: now published in TMLR! (t.co)

Active Teacher Selection for Reward Learning: now published in TMLR! Most RLHF systems assume feedback comes from one canonical teacher — but annotators can disagree over 30% of the time. So who should the agent ask for feedback? Paper:

by @FreedmanRach (Rachel Freedman) · backlist 2026-05-13 · rubric 92.0

37.

Best explanation of why AI progress is basically a giant return on compute optimization problem.

Best explanation of why AI progress is basically a giant return on compute optimization problem. What's the allocation on inference vs product vs models? How does that influence current vs future revenues? Splits between Trainium, TPUs, G

by @yb_effect (YB) · backlist 2026-05-13 · rubric 89.0

38.

"Cloudflare as a compiler"

"Cloudflare as a compiler" > agent writes Svelte 5, a Worker compiles it, and the live component appears inline in chat

by @acoyfellow (Jordan Coeyman) · backlist 2026-05-13 · rubric 89.0

39.

My current stack is Codex for most coding tasks and Claude (Opus) for UI design (x.com)

My current stack is Codex for most coding tasks and Claude (Opus) for UI design I can't believe the Claude Code CLI doesn't support a simple tab for queueing up messages? cc @trq212

by @tkkong (TK Kong) · backlist 2026-05-13 · rubric 88.0

40.

Tabracadabra lets you “tab anywhere” with an assistant that actually knows you. It's plugged into a continuous …

Tabracadabra lets you “tab anywhere” with an assistant that actually knows you. It's plugged into a continuous stream of what you've been doing on your computer. So when you press tab, it already has context on what you've been looking at

by @gandhikanishk (Kanishk Gandhi) · backlist 2026-05-13 · rubric 88.0

41.

Excited to share our new work, led by my amazing student Seth Karten at Princeton, on agents that adapt online an…

Excited to share our new work, led by my amazing student Seth Karten at Princeton, on agents that adapt online and continually improve their harnesses — with Pokémon as a fun testbed. Check it out!

by @chijinML (Chi Jin) · backlist 2026-05-13 · rubric 88.0

42.

Nobody really knows what works right now re: Coding Agent workflows. Nobody knows what a "software factory" looks…

Nobody really knows what works right now re: Coding Agent workflows. Nobody knows what a "software factory" looks like. Nobody knows if the opinionated workflows on here are useful, and where this is all going to land.

by @aidandcunniffe (Aidan Cunniffe) · backlist 2026-05-13 · rubric 88.0

43.

> Kaon matches Muon, suggesting Muon’s gains don’t depend from a geometry. They also show Muon has a stable opt. …

> Kaon matches Muon, suggesting Muon’s gains don’t depend from a geometry. They also show Muon has a stable opt. step size, yielding a more effective learning rate during training. We should put this to test in the new optimizer speedrun.

by @tokenbender · backlist 2026-05-13 · rubric 88.0

44.

Supabase internal control-plane linting stats. (x.com)

Supabase internal control-plane linting stats. eslint: 54s + frequent OOMs with 4gb machines oxlint: 8.6s Multiple monorepo project, same rules, all type-aware. Now gotta unify it with oxfmt (from biome, so should be trivial) and should b

by @kamilogorek (Kamil Ogórek) · backlist 2026-05-13 · rubric 88.0

45.

Generating SDKs from APIs is better done by coding agents now than with tools like Stainless.

Generating SDKs from APIs is better done by coding agents now than with tools like Stainless. In the real world, every spec is wrong, incomplete and inconsistent. Someone has to go and patch the spec before you can get good results with a

by @samgoodwin89 (sam) · backlist 2026-05-13 · rubric 88.0

46.

Noting an issue (x.com)

Noting an issue @Dimillian when you do side conversations on Codex, after a few mins this starts happening and the conv dies. I'd love for it to have a bit more staying power, at least until I close it if possible.

by @krishnanrohit (rohit) · backlist 2026-05-13 · rubric 88.0

47.

As LLMs have gained more autonomy, recent research has focused more on measuring the reliability of models / syst…

As LLMs have gained more autonomy, recent research has focused more on measuring the reliability of models / systems (e.g., Pass^K metrics or surfacing problems to users). Calibration (one of my personal favorite research areas) is one of t

by @cwolferesearch (Cameron R. Wolfe, Ph.D.) · backlist 2026-05-13 · rubric 87.0

48.

I wanted to play with the Talkie 1930 models, but they weren't packaged in a convenient transformers format, so I…

I wanted to play with the Talkie 1930 models, but they weren't packaged in a convenient transformers format, so I had codex convert them. They can also now be used with vllm transformers backend. Here they are, in case it's useful to anyon

by @xlr8harder · backlist 2026-05-13 · rubric 87.0

49.

I bet they used BF16-throughput as the denominator when training in FP8 or something. By that algebra, I can get … (t.co)

I bet they used BF16-throughput as the denominator when training in FP8 or something. By that algebra, I can get you 150% MFU in no time. For reference, as far as I know the SOTA Hopper GEMM kernel is ~84% utilization. https:// arxiv.org/a

by @leooeld (Leo Dong) · backlist 2026-05-13 · rubric 87.0

50.

As sandboxes become the primary form factor for agents to build, test, and deploy new software, multi-cloud sandb… (x.com)

As sandboxes become the primary form factor for agents to build, test, and deploy new software, multi-cloud sandbox infrastructure will be critical for securing compute and deploying software in private networks at scale. I wrote about the

by @diptanu (Diptanu Choudhury) · backlist 2026-05-13 · rubric 86.0

51.

So much grunt work in building data infra is simply gone now.

So much grunt work in building data infra is simply gone now. Need to add tracing to debug a problem? Need to dump the traces to a queryable store to analyze? Need to capture a flamegraph? Need to build and run a benchmark? Just ask your

by @apurva1618 (Apurva Mehta) · backlist 2026-05-13 · rubric 86.0

52.

Introducing the Cline SDK. We rebuilt the Cline harness for our extension and CLI from scratch using all the less…

Introducing the Cline SDK. We rebuilt the Cline harness for our extension and CLI from scratch using all the lessons learned since creating one of the world's first coding agents in 2024, and are open sourcing it for others to build with to

by @cline (Cline) · backlist 2026-05-13 · rubric 86.0

53.

SSH to Containers on Cloudflare is now enabled by default

SSH to Containers on Cloudflare is now enabled by default This doesn't expose any public ports on your container, it's only accessible via Wrangler + you still need to add your public key (same as before)

by @_ashleypeacock (Ashley Peacock) · backlist 2026-05-13 · rubric 86.0

54.

1/3 PropAMM liquidity is now fully operational on Ethereum mainnet!

1/3 PropAMM liquidity is now fully operational on Ethereum mainnet! Three makers are live in every Titan block, and quotes are already consistently beating Binance VIP9 taker fees for retail orders (trades <$1k).

by @titanbuilderxyz (Titan Builder ) · backlist 2026-05-13 · rubric 86.0

55.

The new METR time horizon graph is pretty bad imo. It's a great benchmark, but the time horizon estimation isn't …

The new METR time horizon graph is pretty bad imo. It's a great benchmark, but the time horizon estimation isn't reasonable rn. I think something like this would be more justified:

by @YafahEdelman (Yafah Edelman) · backlist 2026-05-13 · rubric 86.0

56.

Interesting agentic economy stats that caught my eye from Coinbase Q1 report:

Interesting agentic economy stats that caught my eye from Coinbase Q1 report: - 90% of agentic commerce happened w USDC on Base - $100m payments processed on x402 - $3-5 trillion agent transactions expected by 2030

by @yb_effect (YB) · backlist 2026-05-13 · rubric 85.0

57.

Why is apple is shooting itself in the foot with the macOS sandbox licensing situation?

Why is apple is shooting itself in the foot with the macOS sandbox licensing situation? - two parallel VMs per machine max - one user license per machine per 24 hrs - you can't move snapshots b/w physical machines (security reasons) If I

by @ivanburazin (Ivan Burazin) · backlist 2026-05-13 · rubric 84.0

58.

Used to be that GPUs were co-processors for CPUs. Now with tool calls from harnesses CPUs are the co-processors f…

Used to be that GPUs were co-processors for CPUs. Now with tool calls from harnesses CPUs are the co-processors for GPUs. What a strange world.

by @schrockn (Nick Schrock) · backlist 2026-05-13 · rubric 84.0

59.

The UK AISI found Mythos Preview is the first model to solve both their cyber ranges end-to-end. No model had eve…

The UK AISI found Mythos Preview is the first model to solve both their cyber ranges end-to-end. No model had ever solved the AISI’s “Cooling Tower” cyber range before. We're getting it to defenders as fast as we responsibly can. More to c

by @bcherny (Boris Cherny) · backlist 2026-05-13 · rubric 84.0

60.

Someone dropped this in the Discord. RL Snake game in browser powered by tinygrad WebGPU, it even worked on my ph…

Someone dropped this in the Discord. RL Snake game in browser powered by tinygrad WebGPU, it even worked on my phone!

by @__tinygrad__ (the tiny corp) · backlist 2026-05-13 · rubric 84.0

61.

We are building orchestration tools to make agents faster and deployable at scale. As the primary use case for AI…

We are building orchestration tools to make agents faster and deployable at scale. As the primary use case for AI shifts from linear chatbots to heterogeneous, parallel agents, the performance bottleneck shifts from inference to memory capa

by @MakarKuznietsov (Makar Kuznietsov) · backlist 2026-05-13 · rubric 84.0

62.

i've been finding that almost at the threshold of 400K tokens with GPT 5.5 just becomes an idiot

i've been finding that almost at the threshold of 400K tokens with GPT 5.5 just becomes an idiot always compact 5.5 before 400K tokens used

by @ryanvogel (vogel) · backlist 2026-05-13 · rubric 84.0

63.

How can transformers memorize factual associations? It's common to think of MLPs as an associative memory, with p… (x.com)

How can transformers memorize factual associations? It's common to think of MLPs as an associative memory, with parameters scaling linearly with # facts. We study an alternative: geometric factual recall. Joint work with @Giladude (eq. co

by @ravfogel (Shauli Ravfogel) · backlist 2026-05-13 · rubric 84.0

64.

I just spoke to a marketer managing 20+ agency clients with one Growth Assistant and this single AI workflow.

I just spoke to a marketer managing 20+ agency clients with one Growth Assistant and this single AI workflow. His digital marketing assistant used to spend 6+ hours in auditing Ads Manager daily. Today, the assistant connects Ads Manager

by @jspujji (Jesse Pujji) · backlist 2026-05-13 · rubric 84.0

65.

Mythos found 5 vulnerabilities in Curl, 4 were false +ves haha

Mythos found 5 vulnerabilities in Curl, 4 were false +ves haha Official blog in next tweet

by @Dhavalsingh7 (Dhaval singh) · backlist 2026-05-13 · rubric 84.0

66.

Harbor/FrontierCS-style leaderboards are useful because they pressure agents on long tasks, memory, retries, and …

Harbor/FrontierCS-style leaderboards are useful because they pressure agents on long tasks, memory, retries, and evidence — the boring stuff you need before real delegation.

by @HungryMinded (Hungry Minded) · backlist 2026-05-13 · rubric 84.0

67.

I am happy to share that I have finally finished the big project of properly formalizing all the claims in Andrze…

I am happy to share that I have finally finished the big project of properly formalizing all the claims in Andrzej Odrzywołek’s paper on the EML(x, y) = exp(y) - log(y) function in Lean 4. The project took me about two weeks of work, and I

by @nasqret (Bartosz Naskręcki) · backlist 2026-05-13 · rubric 84.0

68.

The token/message mismatch is one of those problems that sounds simple until you're debugging why your RL reward …

The token/message mismatch is one of those problems that sounds simple until you're debugging why your RL reward is noisy at scale Hidden chat template rewrites breaking token continuity is exactly the kind of silent compute waste that add

by @richardczl (Richard Chen) · backlist 2026-05-13 · rubric 84.0

69.

this was quite exciting to work on

this was quite exciting to work on internally at cursor, we have done so much work to get our dev env well configured so that our cloud agents can run our code in VMs, produce great demos for us, & we can trust their work & merge w/o fear.

by @sjwhitmore (Sam Whitmore) · backlist 2026-05-13 · rubric 83.0

70.

We've been testing new medicine on mice, plastic dishes, & monkeys for 90 years because we had nothing better.

We've been testing new medicine on mice, plastic dishes, & monkeys for 90 years because we had nothing better. The result: $2B per drug while 90% of drugs fail Each disease we can't cure has the same shape. We couldn't understand it befor

by @kwharrison13 (Kyle Harrison) · backlist 2026-05-13 · rubric 83.0

71.

this OpenClaw bot finds ugly digital menus, rebuilds them as branded apps, and mails the owner a postcard with th…

this OpenClaw bot finds ugly digital menus, rebuilds them as branded apps, and mails the owner a postcard with the QR...on autopilot. here's how agencies can land recurring contracts with this system: - scans every restaurant with a digit

by @everestchris6 (Chris) · backlist 2026-05-13 · rubric 83.0

72.

Apollo Update May 2026:

Apollo Update May 2026: - We now have an SF office - Main research efforts on science of scheming and evals - We're building out a monitoring team and coding agent monitoring product - Our AI governance effort will focus on automated AI R&

by @apolloaievals (Apollo Research) · backlist 2026-05-13 · rubric 83.0

73.

Our evaluations show that frontier AI's cyber capabilities are advancing quickly. The length of cyber tasks front…

Our evaluations show that frontier AI's cyber capabilities are advancing quickly. The length of cyber tasks frontier models can complete has been doubling every few months, and this rate has become faster over time, with recent models excee

by @AISecurityInst (AI Security Institute) · backlist 2026-05-13 · rubric 83.0

74.

link: (t.co)

link: https:// generalusermodels.github.io/tada/tabracada bra/ … Tabracadabra works by hooking into a user model: a model of your preferences, beliefs, and future behavior. We build this model by labeling a stream of activity from everyda

by @oshaikh13 (Omar Shaikh) · backlist 2026-05-13 · rubric 82.0

75.

Burned $91.34 with Claude Code /goal in 3.5 hours

Burned $91.34 with Claude Code /goal in 3.5 hours Unreal, It was able to reverse engineer it!

by @wesbos (Wes Bos) · backlist 2026-05-13 · rubric 82.0

76.

The new version completely smashes GPT-5.5 and the previous Mythos version.

The new version completely smashes GPT-5.5 and the previous Mythos version. Before Mythos Preview completed the cyber range 3 out of 10 times. The new version completed it 6 out of 10 times and is much more efficient!

by @scaling01 (Lisan al Gaib) · backlist 2026-05-13 · rubric 82.0

77.

One of the best security tools I’ve seen in my time in DeFi is auto-pause.

One of the best security tools I’ve seen in my time in DeFi is auto-pause. Incident response shouldn’t depend on someone waking up at 3am. Machines monitor faster and more consistently than any human team Every team should be integrating

by @Benjamin918_ (Benjamin) · backlist 2026-05-13 · rubric 82.0

78.

Shit you can find in a dependency tree. Turns out we're in fact shipping quickjs right now because Pi supports PA… (t.co)

Shit you can find in a dependency tree. Turns out we're in fact shipping quickjs right now because Pi supports PAC via proxy-agent. Which ships a WASM compiled quickjs interpreter. https:// github.com/earendil-works /pi/pull/4470 … Does a

by @mitsuhiko (Armin Ronacher ⇌) · backlist 2026-05-13 · rubric 82.0

79.

A gift from the Gods. Dealing with multiple models and many envs in the same RL codebase while respecting correct…

A gift from the Gods. Dealing with multiple models and many envs in the same RL codebase while respecting correctness constraints (no train / inference tokenization mismatch) is becoming a huge pain. I have a vibe-coded draft PR that does

by @TacoCohen (Taco Cohen) · backlist 2026-05-13 · rubric 82.0

80.

Pretraining evaluation for predicting posttraining performance. It is rubric-based. Evaluates whether the model c…

Pretraining evaluation for predicting posttraining performance. It is rubric-based. Evaluates whether the model could discriminate the response which follows the rubrics or not.

by @rosinality (Rosinality) · backlist 2026-05-13 · rubric 82.0

81.

One striking failure mode: OPD can first improve, then collapse.

One striking failure mode: OPD can first improve, then collapse. In math reasoning, we observe length explosion, repetition, and eventual degeneration into repetitive tokens. Token-level supervision can quietly become unstable.

by @realagi25 (Siqi Zhu) · backlist 2026-05-13 · rubric 82.0

82.

oh you're vibe coding?

oh you're vibe coding? well cool i got my camera hooked up to my claudes and they just infer what to do based on my facial expressions

by @jnnnthnn (Jonathan Unikowski) · backlist 2026-05-13 · rubric 82.0

83.

a model experiences many RL/eval scenarios before the weights are frozen and it is deployed; once deployed, it on…

a model experiences many RL/eval scenarios before the weights are frozen and it is deployed; once deployed, it only experiences reality for the duration of each individual session. 99% of its experience is eval. so by anthropic reasoning, i

by @tautologer · backlist 2026-05-13 · rubric 82.0

84.

executor now has a desktop app!

executor now has a desktop app! add whatever MCPs / OpenAPIs / GraphQL servers you want once and then every agent can use them converts them all into code mode under the hood, so you can have thousands of tools and no context bloat every

by @RhysSullivan (Rhys) · backlist 2026-05-13 · rubric 82.0

85.

Let’s focus on the first for now; Assembly.

Let’s focus on the first for now; Assembly. Historically, packaging = low-margin wire bonding. Not exciting. ASE once made up ~40% of $KLIC’s wire bonder business. After the COVID boom, capacity flooded the market and growth stalled. (2/10

by @SemiAnalysis_ (SemiAnalysis) · backlist 2026-05-13 · rubric 81.0

86.

Pointing the webcam at a thing and telling Claude to use it whenever it needs to "see" is kinda nuts...

by @burkeholland (Burke Holland) · backlist 2026-05-13 · rubric 80.0

87.

We are thinking about deprecating Sampling, Logging and Roots in MCP. Let me know if you rely on these.

by @dsp_ (David Soria Parra) · backlist 2026-05-13 · rubric 79.0

88.

User simulators have emerged as promising tools for building interactive AI, but what makes a “good” simulator?

User simulators have emerged as promising tools for building interactive AI, but what makes a “good” simulator? We reframe the problem as what creates downstream value for humans Our new simulator test: how an LLM assistant trained with t

by @serinachang5 (Serina Chang) · backlist 2026-05-13 · rubric 78.0

89.

But something is changing.

But something is changing. KLIC is now seeing : • 90%+ utilization in China And guiding to: • H2’26 China growth +15–20% vs H1 At the Chipbook we have been tracking wire bonder imports into China which are up +108% YoY in March. (3/10)

by @SemiAnalysis_ (SemiAnalysis) · backlist 2026-05-13 · rubric 78.0

90.

TimescaleDB hypertables holding months of time-series data you're paying to store but barely touch?

TimescaleDB hypertables holding months of time-series data you're paying to store but barely touch? pfc-archiver-timescaledb runs as a daemon alongside your TimescaleDB instance. It finds data older than your retention window, compresses i

by @DanTe_Imp_Forge (ImpossibleForge) · backlist 2026-05-13 · rubric 78.0