Backlist — 12 May 2026 UTC

1.

Mini Shai-Hulud supply-chain worm crosses from npm into PyPI (x.com)

A credential-stealing package worm had moved beyond npm into PyPI, compromising high-download artifacts including opensearch-project, mistralai, and guardrails-ai

by @SocketSecurity (Socket) · backlist 2026-05-12 · rubric 96.0

2.

Google reports first AI-developed zero-day exploit seen in the wild

Google Threat Intelligence said it detected a threat actor using an AI-developed zero-day exploit before a planned wider attack could land

by @elder_plinius (Pliny the Liberator 󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭) · backlist 2026-05-12 · rubric 92.0

3.

Condé Nast tells brands to operate as if search traffic will be zero (x.com)

A major publisher is planning around the end of search and social referral economics rather than treating traffic declines as a temporary cycle

by @tbpn (TBPN) · backlist 2026-05-12 · rubric 24.0

4.

TabPFN-3: tabular foundation model scales to 1M rows on one H100

Prior Labs claims no-training tabular prediction at enterprise scale with 10–1000x faster inference and support for million-row datasets

by @prior_labs (Prior Labs) · backlist 2026-05-12 · rubric 88.0

5.

Pakistan’s unofficial solar buildout may rival its entire grid

Imported solar panels totaling 51.5 GW suggest Pakistan is building a decentralized solar economy far larger than official net-metering statistics show

by @EVCurveFuturist (Chris Meder) · backlist 2026-05-12 · rubric 42.0

6.

Distributed inference is becoming a hard systems problem

Modern inference now spans reasoning models, agents, KV caches, heterogeneous hardware, and routing rather than a single request running on a single machine

by @mansourkaram (Mansour Karam) · backlist 2026-05-12 · rubric 92.0

7.

Programming cells with RNA language models

A new line of work extends language-model approaches from decoding 5′ UTR regulatory grammar toward designing RNA programs that control cellular behavior

by @lecong (CL • Le Cong) · backlist 2026-05-12 · rubric 78.0

8.

Long Lake’s $6.3B AI-driven take-private of Amex GBT (x.com)

A 100-year-old public travel company is being taken private with an explicit plan to transform operations through AI rather than merely add software tools

Editor’s note: imported_from_x_likes

by @eladgil (Elad Gil) · backlist 2026-05-12 · rubric 78.0

9.

Porting SQLite’s parser to JavaScript made it faster than JS SQL parsers

A native JavaScript port of SQLite’s parser reportedly beats existing JS and WASM SQL parsers by 2.5x to 200x depending on the comparison

by @jitl (Jake ) · backlist 2026-05-12 · rubric 87.0

10.

Only 28 cents of each new AI dollar is net-new IT budget (x.com)

Enterprise AI spending appears mostly reallocated from software, services, headcount, BPO, and license consolidation rather than added as fresh budget

by @caro_milanesi (Carolina Milanesi) · backlist 2026-05-12 · rubric 78.0

11.

Soohak: 439 research-level math problems curated by mathematicians (t.co)

A new benchmark of novel research math problems from 64 mathematicians has frontier models scoring under 30%, beyond saturated olympiad-style tests

by @seungonekim (Seungone Kim) · backlist 2026-05-12 · rubric 91.0

12.

Apple’s Wi-Fi attack surface is unusually hardened

Apple’s application-processor-side Wi-Fi stack now combines mitigations like MIE and the XZM allocator in ways that make exploitation harder than on other platforms

by @defendtheworld (Alex Rad) · backlist 2026-05-12 · rubric 68.0

13.

Bristol Myers and Hengrui sign $15.2B multi-asset biotech collaboration (t.co)

The deal covers 13 early-stage oncology, hematology, and immunology programs, making it one of the largest China–global biotech alliances to date

by @chuminhua432 (Minhua Chu) · backlist 2026-05-12 · rubric 20.0

14.

Generating manufacturable RF designs from target S-parameters (x.com)

A diffusion model produced two-layer RF designs with vias and closed-loop EM verification, with fabricated filters used to validate the approach

by @_i_am_arya (Arya Hezarkhani) · backlist 2026-05-12 · rubric 62.0

15.

A history of activation checkpointing APIs in PyTorch

Activation checkpointing is central to training large models, and its PyTorch API history exposes the tradeoffs between memory, recomputation, and usability

by @ezyang (Edward Z. Yang) · backlist 2026-05-12 · rubric 82.0

16.

Flux Matching generalizes diffusion models beyond score functions (x.com)

Flux Matching learns broader vector fields with the data distribution as stationary, enabling faster mixing, interpretable dynamics, and structural priors

by @peterpaohuang (Peter Pao-Huang) · backlist 2026-05-12 · rubric 78.0

17.

PROWL: RL agents that find failures in world models

World models can improve by having reinforcement-learning agents explore simulators and games to discover adversarial trajectories and failure cases automatically

by @olivercameron (Oliver Cameron) · backlist 2026-05-12 · rubric 82.0

18.

Querying 14.3M Texas land parcels with DuckDB (x.com)

Modern laptop-scale geospatial tools can draw arbitrary regions, query millions of parcels, and tabulate surface ownership interactively

by @kyle_e_walker (Kyle Walker) · backlist 2026-05-12 · rubric 55.0

19.

Mouse, macaque, chicken, and turtle vasculature share scale-free geometry (t.co)

CODA maps suggest vascular systems across species obey the same space-filling fractal geometry with dimension three

by @deniswirtz (Denis Wirtz) · backlist 2026-05-12 · rubric 22.0

20.

Rendering realistic skies, sunsets, and planets in real time (t.co)

A full rendering walkthrough connects atmospheric light scattering, sunsets, and planet-scale views into a real-time graphics implementation

by @MaximeHeckel (Maxime) · backlist 2026-05-12 · rubric 52.0

21.

Fine-tuning π0.5 on 40 drone missions (x.com)

A robotics foundation model was adapted to fly a drone by outputting directional velocities inside the flight control loop rather than waypoint commands

by @__Rhodium__ (Lucas) · backlist 2026-05-12 · rubric 92.0

22.

Optimizing Software in C++ by Agner Fog (t.co)

Agner Fog’s regularly updated C++ optimization guide remains a compact reference for low-level performance work across CPUs and compilers

by @vivekgalatage (Vivek Galatage) · backlist 2026-05-12 · rubric 88.0

23.

MIT shrinking technique could enable optical computing devices (t.co)

A Boyden lab technique can fabricate nanoscale devices for manipulating visible light, potentially supporting future optical computing hardware

by @mcgovernmit (McGovern Institute) · backlist 2026-05-12 · rubric 24.0

24.

Google and SpaceX reportedly discuss orbital data centers

Orbital data centers are moving from speculative pitch to reported launch discussions between two of the companies capable of testing the idea

by @business (Bloomberg) · backlist 2026-05-12 · rubric 86.0

25.

OpenAI payments to Microsoft reportedly capped at $38B

New revenue-sharing terms would cap OpenAI’s payments to Microsoft far below a prior path that could have reached $135B through 2030

by @Techmeme · backlist 2026-05-12 · rubric 52.0

26.

A no-bid ICE defense vendor used a watermarked stock photo as an executive

A $12M no-bid contract led to a strange trail of corporate misrepresentation, including a fake-looking development chief still bearing a stock-photo watermark

by @ktyschwnk (katya schwenk) · backlist 2026-05-12 · rubric 72.0

27.

pnpm’s blockExoticSubdeps can blunt GitHub-reference package attacks

Setting a minimum release age is not enough if packages can still pull remote GitHub references, which pnpm can block with blockExoticSubdeps

by @ramimacisabird (Rami McCarthy) · backlist 2026-05-12 · rubric 61.0

28.

ChatGPT’s Study Mode disappears while Claude and Gemini keep theirs

Removing tutoring-style interaction pushes students back toward answer-giving assistants, which can create the illusion of learning without retention

by @emollick (Ethan Mollick) · backlist 2026-05-12 · rubric 24.0

29.

Modded-NanoGPT optimization result #14 (2026/05/04): (x.com)

Modded-NanoGPT optimization result #14 (2026/05/04): @Sam_Acqua has achieved a new record of 3150 steps (-60), by adding SOAP preconditioning before Muon orthogonalization for the MLP weights (SOAP-Muon).

by @kellerjordan0 (Keller Jordan) · backlist 2026-05-12 · rubric 96.0

30.

Launching Agentick

Launching Agentick A unified benchmark for training and evaluating general sequential decision-making agents. RL agents, LLMs, VLMs, hybrids, bots, and humans can all be evaluated on: same tasks. same seeds. same score. First result: n

by @creus_roger (Roger Creus Castanyer) · backlist 2026-05-12 · rubric 96.0

31.

Modded-NanoGPT optimization result #13: (x.com)

Modded-NanoGPT optimization result #13: @benjamintherien has achieved a new record of 3210 steps (-15), by wrapping NorMuonH in a MuLoCo-style outer Nesterov SGD. Compared to the target loss, this result has a p-value of p=1.3e-4. Compar

by @kellerjordan0 (Keller Jordan) · backlist 2026-05-12 · rubric 96.0

32.

You can have between 10,000 and 5 million sandboxes running concurrently on (x.com)

You can have between 10,000 and 5 million sandboxes running concurrently on @daytonaio Try calling AWS and asking for that type of concurrency. They'll ask a ton of questions, and it's going to take a lot of time to get provisioned. For

by @ivanburazin (Ivan Burazin) · backlist 2026-05-12 · rubric 94.0

33.

GB 200s change how one does the prefill and decode disaggregation when serving large MoEs like Qwen. We’ve publis…

GB 200s change how one does the prefill and decode disaggregation when serving large MoEs like Qwen. We’ve published details of our stack quantifying the throughput benefits compared to serving on Hoppers.

by @AravSrinivas (Aravind Srinivas) · backlist 2026-05-12 · rubric 94.0

34.

Microsoft is investigating mistralai PyPI package v2.4.6 compromise. Attackers injected code in mistralai/client/…

Microsoft is investigating mistralai PyPI package v2.4.6 compromise. Attackers injected code in mistralai/client/__init__.py that executes on import, downloads hxxps://83[.]142[.]209[.]194/transformers.pyz to /tmp/transformers.pyz, and laun

by @MsftSecIntel (Microsoft Threat Intelligence) · backlist 2026-05-12 · rubric 94.0

35.

San Francisco Compute builds both the financial & technical layer.

San Francisco Compute builds both the financial & technical layer. We build the order book & then we build everything else: the VM orchestration, the clusters, and the data centers. We were the first to do this, we hit scale, are growing

by @evanjconrad (evan conrad) · backlist 2026-05-12 · rubric 92.0

36.

super cool and ambitious initiative to try to fully automate a significant chunk of optimization research! also c…

super cool and ambitious initiative to try to fully automate a significant chunk of optimization research! also can't wait for this low-rank hyperparameter transfer paper.

by @tonysilveti (Tony S.F.) · backlist 2026-05-12 · rubric 92.0

37.

Crazy I was able to make a ~1200 page math document with Codex that should be correct after agents checking throu…

Crazy I was able to make a ~1200 page math document with Codex that should be correct after agents checking through it over and over. /goal is cool

by @AcerFur (Acer) · backlist 2026-05-12 · rubric 92.0

38.

cool work :)

cool work :) if you've ever tried world models, you know how easily they break (e.g. stare into grass in minecraft and you'll easily fall OOD) using RL to find adversarial trajectories and then improve the world models is great - esp. if

by @arnie_hacker (Arnie Ramesh) · backlist 2026-05-12 · rubric 91.0

39.

The elegance of Slime lies in combining the best existing components (SGLang + Megatron + Ray) in the cleanest way.

The elegance of Slime lies in combining the best existing components (SGLang + Megatron + Ray) in the cleanest way. Its top-level logic is simple with only dozens of lines yet each module has enough depth to handle complex engineering det

by @sheriyuo (Xiuyu Li) · backlist 2026-05-12 · rubric 91.0

40.

Crabbox 0.12.0 is live

Crabbox 0.12.0 is live Azure Windows desktop + WSL2 Proxmox + Tensorlake providers preflight, failure bundles, phase timing keep failed boxes around for SSH debugging Remote test boxes got much less slippery.

by @steipete (Peter Steinberger ) · backlist 2026-05-12 · rubric 91.0

41.

The probability of this result not reaching the target loss is p=1.1e-4. The probability of this result being obt…

The probability of this result not reaching the target loss is p=1.1e-4. The probability of this result being obtainable by shortening the previous record is p=1.4e-6. Reproducible log:

by @kellerjordan0 (Keller Jordan) · backlist 2026-05-12 · rubric 90.0

42.

@augmentcode (x.com)

@augmentcode rebuilt their context compaction layer around Mercury 2. 82% latency cut. 90% cost cut. Comparable quality to Opus 4.7. Running in production today. "We took a counter-intuitive bet. We decoupled summarization entirely, offlo

by @_inception_ai (Inception) · backlist 2026-05-12 · rubric 90.0

43.

1/ A single neuron is sufficient to bypass safety alignment in LLMs.

1/ A single neuron is sufficient to bypass safety alignment in LLMs. Across 7 models, 2 families, and scales from 1.7B to 70B, suppressing one MLP neuron bypasses refusal behavior — with no fine-tuning and no prompt engineering. We call

by @hamid_kazemi22 (Hamid Kazemi) · backlist 2026-05-12 · rubric 90.0

44.

You can extend every step of Claude Code's agentic loop. I've been thinking a lot about what that means for the l…

You can extend every step of Claude Code's agentic loop. I've been thinking a lot about what that means for the last one. What are you doing to help Claude verify its own work? Genuinely want to hear what workflows people have.

by @delba_oliveira (Delba) · backlist 2026-05-12 · rubric 88.0

45.

We’ve built a tool called Genie that turns meetings into software.

We’ve built a tool called Genie that turns meetings into software. If someone on the team says “I wish we had a tool for X” during a meeting, Genie automatically builds it. How it works: • analyzes granola meeting transcripts • creates L

by @sethbannon (Seth Bannon) · backlist 2026-05-12 · rubric 88.0

46.

Also, I realized that JAX itself isn't magic per-se. E.g. training a regular GPT2 on the latest 6th gen TPU hardw…

Also, I realized that JAX itself isn't magic per-se. E.g. training a regular GPT2 on the latest 6th gen TPU hardware is around 85 minutes, while modded GPT2 on PyTorch can do under 2 minutes

by @brandon_xyzw (Brandon) · backlist 2026-05-12 · rubric 88.0

47.

Recently we showed that the minimax optimal rate for multicalibration is T^{2/3}. But that doesn't mean you have …

Recently we showed that the minimax optimal rate for multicalibration is T^{2/3}. But that doesn't mean you have to do that badly on all instances. We give an algorithm that can adapt to easy instances and get better rates while still being

by @Aaroth (Aaron Roth) · backlist 2026-05-12 · rubric 88.0

48.

This is a really fun and multi purpose feature! I currently use these APIs to hold and then cleanly evict kv cach…

This is a really fun and multi purpose feature! I currently use these APIs to hold and then cleanly evict kv cache from spawned subagents since StreamingSessions are not added to the RadixTree or written to lower memory tiers.

by @0xishand (ishan) · backlist 2026-05-12 · rubric 88.0

49.

Hmm, could not handle the FOMO of (x.com)

Hmm, could not handle the FOMO of @antirez DS4 so I made it work on my Strix Halo using ROCm HIPify

by @gonizahavy (goniz) · backlist 2026-05-12 · rubric 88.0

50.

Verification bottlenecks progress. Bandwidth bottlenecks verification.

by @xiuyu_l (Xiuyu Li) · backlist 2026-05-12 · rubric 88.0

51.

Real time, multimodal, full duplex. Super excited to this model.

Real time, multimodal, full duplex. Super excited to this model. Also feel tremendous multimodal infra behind this demo.

by @zhzHNN (Huaizheng Zhang) · backlist 2026-05-12 · rubric 88.0

52.

We started out trying to benchmark the AIs...

We started out trying to benchmark the AIs... We had experts create the benchmark... we had experts validate the benchmark... ...Then AIs starting doing well on the benchmark ..Now AIs found critical errors in the benchmark itself the human

by @peterwildeford (Peter Wildeford) · backlist 2026-05-12 · rubric 86.0

53.

1/ Following our previous MoE paper w/ (x.com)

1/ Following our previous MoE paper w/ @hayou_soufiane ( https:// arxiv.org/abs/2604.09780), we confirmed that scaling the residual stream: h^{\ell+1} = h^{\ell} + alpha \Delta^\ell improves MoE load balancing at initialization by reduci

by @xidulu (Xidulu) · backlist 2026-05-12 · rubric 86.0

54.

The first ProgramBench task was just solved by GPT 5.5 high/xhigh. Interestingly, high/xhigh picked two different…

The first ProgramBench task was just solved by GPT 5.5 high/xhigh. Interestingly, high/xhigh picked two different languages for the task (C vs Python). GPT 5.5 xhigh was significantly better than Opus 4.7 xhigh in all metrics.

by @KLieret (Kilian Lieret) · backlist 2026-05-12 · rubric 86.0

55.

The third semis memo is out

The third semis memo is out We talk about power & analog semis, orchestration plane in the agentic era, the neoclouds trade, interconnect bottleneck (probably the biggest limiter for 2026-27), Korea Unlocked

by @zephyr_z9 (Zephyr) · backlist 2026-05-12 · rubric 86.0

56.

Today: OpenMed Agent ships in preview. (x.com)

Today: OpenMed Agent ships in preview. Built on @huggingface : → HF endpoints power clinical extraction + terminology → MCP for your own services → Every tool call, every plan, fully visible 1,000+ OpenMed medical models on HF. Preview

by @MaziyarPanahi (Maziyar PANAHI) · backlist 2026-05-12 · rubric 86.0

57.

Cost_train = Cost_inference (t.co)

Cost_train = Cost_inference It's never too late, http:// arxiv.org/abs/2503.14647 Towards More Economical Context-Augmented LLM Generation by Reusing Stored KV Cache

by @sheriyuo (Xiuyu Li) · backlist 2026-05-12 · rubric 86.0

58.

2.5GB cold start in less than 2 minutes...

2.5GB cold start in less than 2 minutes... At my previous company, the optimization I did for serverless deployment of large models was 8GB cold start in less than 20 seconds.

by @yetone · backlist 2026-05-12 · rubric 86.0

59.

What if letting frontier LLMs design their own test-time scaling strategies is much easier than it sounds?

What if letting frontier LLMs design their own test-time scaling strategies is much easier than it sounds? Introducing AutoTTS — an environment-driven discovery framework. Humans define the right environment; frontier coding agents discove

by @zhengtoong (Tong Zheng) · backlist 2026-05-12 · rubric 86.0

60.

seems like amd's ATOM is capable of providing the FASTEST open source inference. (x.com)

seems like amd's ATOM is capable of providing the FASTEST open source inference. we used AMD's atom to beat all other providers on @ArtificialAnlys and provided the code below. truly excited for the new wave of heterogenous compute!!!

by @gpusteve (steve) · backlist 2026-05-12 · rubric 85.0

61.

Read this article carefully

Read this article carefully U will be hearing a lot more about the PCB/interconnect bottleneck when mass production of TPU v8, Rubin, and Trainium3 starts in Q4 2026

by @zephyr_z9 (Zephyr) · backlist 2026-05-12 · rubric 84.0

62.

7/ Regarding the frontend/backend design (what they call the "interaction" and "background" models):

7/ Regarding the frontend/backend design (what they call the "interaction" and "background" models): (i) How do you teach the frontend model when to defer to the backend? LLM's famously have problems in knowning what they don't know. For a

by @rdesh26 (Desh Raj) · backlist 2026-05-12 · rubric 84.0

63.

Congrats to (x.com)

Congrats to @andrew_li03 and the @JudgementLabs team on their fundraise! I vividly remember back in early 2025 when Andrew explained to me why agent monitoring and evaluation would be so crucial for any enterprise. It’s awesome to see t

by @zeeshanp_ (Zeeshan Patel) · backlist 2026-05-12 · rubric 84.0

64.

And, of course, they should be plotted with compute, latency, or cost on the x-axis.

by @polynoamial (Noam Brown) · backlist 2026-05-12 · rubric 84.0

65.

Preparing an AI evaluation budget by just estimating how many human hours the task would take and discounting pre… (x.com)

Preparing an AI evaluation budget by just estimating how many human hours the task would take and discounting prevailing human wages. This is @joel_bkr thought.

by @GregHBurnham (Greg Burnham) · backlist 2026-05-12 · rubric 84.0

66.

xAI revamps the "Grok Computer" section that was mentioned last week.

xAI revamps the "Grok Computer" section that was mentioned last week. Now the setting more accurately says "Work Folder" and gives you 2 options, Default ( Groks Sandbox "computer" ), or Google Drive. This will allow Grok to work dire

by @blankspeaker (ᅠ‏ ᅠ ᅠ) · backlist 2026-05-12 · rubric 84.0

67.

it's very important our inference business has customers world wide

it's very important our inference business has customers world wide the entire game is keeping GPUs busy hard to do that if your customers are all in one timezone

by @thdxr (dax) · backlist 2026-05-12 · rubric 84.0

68.

Even without releasing Mythos, cyberattack threat surfaces are way larger than you'd think. Attackers can put ag…

Even without releasing Mythos, cyberattack threat surfaces are way larger than you'd think. Attackers can put agents into an RL-like loop until they find vulnerabilities and lift attack success rates. Expect to see big scale-up of cyberatt

by @PeterHndrsn (Peter Henderson) · backlist 2026-05-12 · rubric 84.0

69.

Diffusion world models can help test and improve robot policies before running them on real robots. (t.co)

Diffusion world models can help test and improve robot policies before running them on real robots. But can the choice of latent space make the WM more faithful? We show that semantic spaces beat reconstruction spaces on task relevant met

by @nilaksh404 (Nilaksh) · backlist 2026-05-12 · rubric 84.0

70.

1/

1/ The "20 tokens per parameter" Chinchilla scaling law is flawed. It is an artifact of your tokenizer. Scaling shouldn't be measured in tokens at all. It should be measured in bytes.

by @che_shr_cat (Grigory Sapunov) · backlist 2026-05-12 · rubric 84.0

71.

one of the big challenges with byte-based (hierarchical) LLMs is the slow decoding, since that is byte by byte, e…

one of the big challenges with byte-based (hierarchical) LLMs is the slow decoding, since that is byte by byte, even with a smaller decoder. Glad to see model architectures addressing this bottleneck.

by @pieterdelobelle (Pieter Delobelle) · backlist 2026-05-12 · rubric 84.0

72.

Benchmark on AI literature review quality: depth × reliability × breadth.

Benchmark on AI literature review quality: depth × reliability × breadth. DeepSeek-V4-Pro (1 in 92) and Claude Opus 4.7 (0 in 104) show the lowest hallucination rates on this task. DeepSeek-V4-Pro’s writing ability feels like a real qual

by @sheriyuo (Xiuyu Li) · backlist 2026-05-12 · rubric 84.0

73.

On-policy distillation (OPD) is one of the most effective LLM post-training methods, but it traditionally require…

On-policy distillation (OPD) is one of the most effective LLM post-training methods, but it traditionally requires a costly live teacher server throughout training. In our latest work, Lightning OPD, we show that OPD can be performed fully

by @hancai_hm (Han Cai) · backlist 2026-05-12 · rubric 84.0

74.

thoughts after doing a bunch of synthetic data gen for eval + environment building

thoughts after doing a bunch of synthetic data gen for eval + environment building - LLMs are incredible projections of the world bundled into a set of weights - but doing targeted extraction of certain distributions from those weight is

by @Vtrivedy10 (Viv) · backlist 2026-05-12 · rubric 84.0

75.

Not all diffusion noise is equally useful for training!

Not all diffusion noise is equally useful for training! We introduce NoiseRater: a meta-learned framework that scores and selects informative noise instances during diffusion training. Instead of treating Gaussian noise uniformly, we lear

by @WUFang40615703 (Fang Wu) · backlist 2026-05-12 · rubric 84.0

76.

Update 5:05 PT: The attack has now expanded well beyond (x.com)

Update 5:05 PT: The attack has now expanded well beyond @TanStack and @Mistral . 373 malicious package-version entries across 169 npm package names, including @uipath , @squawk , @tallyui , @beproduct , and more. The malware propa

by @AikidoSecurity (Aikido Security) · backlist 2026-05-12 · rubric 84.0

77.

How well do MLLMs and agentic video frameworks handle questions (e.g., tracking objects or abstracting recurring …

How well do MLLMs and agentic video frameworks handle questions (e.g., tracking objects or abstracting recurring behavior patterns) over long-horizon videos, which often require memory to retrieve and aggregate information across time? To

by @hyunji_amy_lee (hyunji amy lee) · backlist 2026-05-12 · rubric 83.0

78.

My first Cloudflare ship is in wrangler@4.90.1

My first Cloudflare ship is in wrangler@4.90.1 It fixes remote bindings hanging indefinitely when closing a wrangler dev session A little side ship while I was working my way through onboarding, simple fix but tricky to nail down - fun w

by @_ashleypeacock (Ashley Peacock) · backlist 2026-05-12 · rubric 83.0

79.

Update on the jax-js thing: I've gone back to TensorFlow.js

Update on the jax-js thing: I've gone back to TensorFlow.js It's simpler (direct WGSL instead of 2 compilation stages), and performance was easier to improve in tf.js Kernel fusion only gives you marginal benefits, and for interpretabilit

by @brandon_xyzw (Brandon) · backlist 2026-05-12 · rubric 83.0

80.

There will be many winners and losers in the next 10 years while the stack for AI compute matures.

There will be many winners and losers in the next 10 years while the stack for AI compute matures. If you want to succeed, focus on timeless numbers like FLOPS/$, FLOPS/W, GB/s/$ and GB/$. Any focus on applications will have a short shelf

by @__tinygrad__ (the tiny corp) · backlist 2026-05-12 · rubric 82.0

81.

Compare Speech to Speech models on Tau voice: (t.co)

Compare Speech to Speech models on Tau voice: https:// artificialanalysis.ai/speech-to-spee ch … Methodology: https:// artificialanalysis.ai/speech-to-spee ch/methodology …

by @ArtificialAnlys (Artificial Analysis) · backlist 2026-05-12 · rubric 82.0

82.

someone already wrote a love letter to pi, by (x.com)

someone already wrote a love letter to pi, by @badlogicgames . so we wrote a love paper to pi :) with my teammates @xuzihuan4 and @lintool . a few days ago, i promised i’d share some fun plots once Pi-Serini joined the BrowseComp-Plu

by @mattjustram (Jheng-Hong Yang) · backlist 2026-05-12 · rubric 82.0

83.

These days, companies are struggling to keep their AI agents from running amuck.

These days, companies are struggling to keep their AI agents from running amuck. Judgment Labs, led by 22-year-old Alex Shan, is tackling agent monitoring and evals and raised 2 back-to-back rounds from Lightspeed, most recently at a $175m

by @steph_palazzolo (Stephanie Palazzolo) · backlist 2026-05-12 · rubric 82.0

84.

For anyone building scientific agents on top of stochastic generative tools, another proof that the bottleneck ri… (x.com)

For anyone building scientific agents on top of stochastic generative tools, another proof that the bottleneck right now is the evaluate-and-filter loop, not the model and not the tool catalog. Striking new benchmark of LLM agents for prot

by @SylvainGariel (Sylvain Gariel) · backlist 2026-05-12 · rubric 82.0

85.

if you are building knowledge worker agents that require more setup than the equivalent of "here's a laptop in th…

if you are building knowledge worker agents that require more setup than the equivalent of "here's a laptop in the mail show up to the office Tuesday at 8:30am" you're ngmi

by @_lopopolo (Ryan Lopopolo) · backlist 2026-05-12 · rubric 82.0

86.

There is a consistent thread among frontier researchers: the best training grounds for model breakthroughs are do…

There is a consistent thread among frontier researchers: the best training grounds for model breakthroughs are domains with massive, discrete search spaces and easily verifiable outcomes. Think of Sudoku. In principle, you can brute-force

by @richa_lq (Richa Sharma) · backlist 2026-05-12 · rubric 81.0

87.

The tradeoff here should be trading a slower cold start speed for a faster inference speed, similar to how vLLM p…

The tradeoff here should be trading a slower cold start speed for a faster inference speed, similar to how vLLM pre-caches a CUDA graph during each startup to reduce the overhead of continuously launching kernels during inference. Since thi

by @thesophiaxu (Sophia Xu) · backlist 2026-05-12 · rubric 79.0

88.

this is exactly correct. if the agent can’t use it immediately, it won’t be used. build the software for agents f…

this is exactly correct. if the agent can’t use it immediately, it won’t be used. build the software for agents first, humans are stakeholders. challenge: you have to build for the dumbest agent model someone might use. “doesn’t work! Btw

by @fujikanaeda (Eric W. Tramel) · backlist 2026-05-12 · rubric 79.0

89.

the window has closed on building products that I, the customer, must integrate with. I should be able to drop ag…

the window has closed on building products that I, the customer, must integrate with. I should be able to drop agents into the workspace with zero setup and they set themselves up

by @_lopopolo (Ryan Lopopolo) · backlist 2026-05-12 · rubric 79.0

90.

One fun accidental discovery during my PhD was when I accidentally heated up my superconducting resonator by spam…

One fun accidental discovery during my PhD was when I accidentally heated up my superconducting resonator by spamming the piezo motor in the dil fridge to see if it was even working This is the resonator thermal noise peak, the frequency s

by @SinghJyotirmai (Jyotirmai Singh) · backlist 2026-05-12 · rubric 79.0