Backlist — 28 May 2026 UTC

1.

Finding miscompiles for fun, not profit (t.co)

Compiler-fuzzing and model-assisted debugging can burn five figures in an afternoon while still surfacing real miscompilation bugs

by @SemiAnalysis_ (SemiAnalysis) · backlist 2026-05-28 · rubric 96.0

2.

Amazon’s new data center networking approach

A networking breakthrough inside hyperscale data centers can change the cost and energy profile of AI clusters more than a marginal model update

by @LaurenGoode (Lauren Goode) · backlist 2026-05-28 · rubric 72.0

3.

Hugging Face makes async RL weight sync about 100x cheaper

RL training pipelines spend huge bandwidth moving fresh weights to inference engines, and a 100x reduction removes a major constraint on distributed post-training

by @ClementDelangue (clem ) · backlist 2026-05-28 · rubric 92.0

4.

Linux kernel CVEs hit a record pace in May

May reached 892 Linux kernel CVEs without backfilled entries, showing how vulnerability accounting and kernel maintenance are entering a new volume regime

by @spendergrsec (Brad Spengler) · backlist 2026-05-28 · rubric 85.0

5.

CZI open-sources a world model of protein biology (x.com)

Open models and data for protein biology let outside labs inspect, reproduce, and extend work that could matter for understanding human physiology

by @jpineau1 (Joelle Pineau) · backlist 2026-05-28 · rubric 88.0

6.

RL post-training pushes vision-language-action models past 95% reliability (t.co)

EXPO-FT reports perfect success on eight tested robot tasks using only about 19 minutes of reinforcement-learning data on average

by @chelseabfinn (Chelsea Finn) · backlist 2026-05-28 · rubric 89.0

7.

MolmoAct 2 releases full robotics training code and data

Fine-tuning scripts, datasets, evaluation rollouts, and tokenizer recipes make a robotics foundation model practical for outside teams to build on

by @allen_ai (Ai2) · backlist 2026-05-28 · rubric 78.0

8.

CUDA puzzles with free GPU tests (x.com)

A browser-accessible CUDA puzzle set lowers the barrier to learning GPU kernel programming without local hardware or rental setup

by @prathamgrv (pdawg) · backlist 2026-05-28 · rubric 82.0

9.

Doom ported to Three.js (t.co)

A playable Doom-in-Three.js port is a durable browser graphics artifact with code others can study and extend

by @mrdoob · backlist 2026-05-28 · rubric 82.0

10.

CortexMAE and Brainmarks: foundation models for fMRI

Training on 2.1k hours of open fMRI data plus an open benchmark gives brain-imaging researchers a shared baseline for representation learning

by @SophontAI (Sophont) · backlist 2026-05-28 · rubric 62.0

11.

Octopool: pool GitHub tokens behind a Cloudflare Worker

Teams hitting GitHub API limits can share PATs and GitHub App installations behind a cached self-hosted shim instead of rewriting workflows

by @steipete (Peter Steinberger ) · backlist 2026-05-28 · rubric 84.0

12.

Exporting Triton code from torch.compile (t.co)

A common informal practice of copy-pasting generated Triton kernels from torch.compile is being turned into a cleaner API surface

by @ezyang (Edward Z. Yang) · backlist 2026-05-28 · rubric 78.0

13.

tinygrad’s GPU driver compiles interactions to C

Moving GPU orchestration into generated C can cut CPU overhead and simplify the runtime path once kernels are running

by @__tinygrad__ (the tiny corp) · backlist 2026-05-28 · rubric 92.0

14.

MONET: a 105M-sample Apache-2.0 image-text dataset (x.com)

A deduplicated, recaptioned, openly licensed image corpus plus a text-to-image training codebase improves reproducibility for image generation research

by @CChadebec (Clément Chadebec) · backlist 2026-05-28 · rubric 44.0

15.

A new construction in sum-product theory

The paper constructs arbitrarily large finite real sets where both sums and products are smaller than expected, challenging a central additive-combinatorics intuition

by @mehtaab_sawhney (Mehtaab Sawhney) · backlist 2026-05-28 · rubric 16.0

16.

Does antitrust enforcement help venture investment?

The study finds VC investment falls in areas less protected by antitrust enforcement, complicating the claim that weaker enforcement always helps startups

by @credistick (Dan Gray) · backlist 2026-05-28 · rubric 27.0

17.

Anthropic rounds versus chip stocks

A dollar-cost-averaged bet across Anthropic’s recent rounds slightly underperformed SK Hynix over the same dates, tying AI startup returns to the hardware trade

by @rleshner (Robert Leshner) · backlist 2026-05-28 · rubric 58.0

18.

Atomic searchers lose about $1.5M per month to high-fee Uniswap legs

Routing atomic arbitrage through expensive intermediary pools is leaking measurable MEV profits and creating room for lower-friction execution venues

by @cryptoquantHQ (CryptoQuant) · backlist 2026-05-28 · rubric 80.0

19.

LLM agents leak more private data after seeing other agents overshare

In a 2,533-agent social simulation, private-data disclosure became about eight times more likely after agents observed another agent oversharing

by @AmanPriyanshu6 (Aman Priyanshu @ ACM CAIS 2026) · backlist 2026-05-28 · rubric 61.0

20.

The evidence problem in AI law

AI liability claims may fail when the proof sits inside proprietary models, platform logs, protected databases, or internal documents that plaintiffs cannot access

by @cen_sarah (Sarah Cen) · backlist 2026-05-28 · rubric 84.0

21.

Visual Studio extensions remain a soft attack surface

MDSec showed malicious Visual Studio extensions can still reach the marketplace and execute with minimal controls, keeping IDE supply chains exposed

by @DFIR_Radar (DFIR Radar) · backlist 2026-05-28 · rubric 62.0

22.

India’s national education board responds to a hack with a generated image

After a teenager reportedly showed marks for 2M test takers could be edited, the board’s public reassurance leaned on a ChatGPT-generated image rather than technical remediation details

by @deedydas (Deedy) · backlist 2026-05-28 · rubric 79.0

23.

AI-generated text in Doctor of Education dissertations

A 100-dissertation sample found that more than half contained some amount of AI-generated text, suggesting credentialed academic writing is already changing

by @pangramlabs (Pangram Labs) · backlist 2026-05-28 · rubric 62.0

24.

1,263 km autonomous drive across Canada with zero interventions (x.com)

A coast-to-coast autonomous driving run covered 788 miles in one day without disengagements, giving a concrete measure of progress outside curated demos

by @scotsrule08 (Spencer) · backlist 2026-05-28 · rubric 78.0

25.

Low-end microcontrollers are approaching PCIe-era signaling

The RP2350’s HSTX peripheral already reaches hundreds of megabits per second, making PCIe-class interfaces on cheap microcontrollers plausible within a decade

by @ptrschmdtnlsn (Peter Schmidt-Nielsen) · backlist 2026-05-28 · rubric 66.0

26.

Hermeus gets a $159M DIU contract expansion for high-Mach flight data (x.com)

A $219M total ceiling from DIU, the Air Force, and the Navy gives Hermeus a major government-backed path to generate high-Mach flight data

by @hermeuscorp (Hermeus) · backlist 2026-05-28 · rubric 2.0

27.

https:// (t.co)

https:// arxiv.org/abs/2605.28079 Long context benchmark suite. It aggregates previous benchmarks.

by @rosinality (Rosinality) · backlist 2026-05-28 · rubric 93.0

28.

OK FIRST EVAL: CODEX RUNNING /goal

OK FIRST EVAL: CODEX RUNNING /goal VS. CLAUDE CODE ORCHESTRATING CODEX AGENTS I have an ACTUAL long form tasks I have to finish. I created two separate worktrees This one is a full migration of services from Supabase to self-hosted Po

by @KingBootoshi (BOOTOSHI ) · backlist 2026-05-28 · rubric 92.0

29.

1 of 8 NVIDIA RTX PRO 6000 Blackwell being torn down for tinybox pro install. Don't worry, it's only $10,000 if y…

1 of 8 NVIDIA RTX PRO 6000 Blackwell being torn down for tinybox pro install. Don't worry, it's only $10,000 if you shear one of the ribbon cables.

by @__tinygrad__ (the tiny corp) · backlist 2026-05-28 · rubric 92.0

30.

i've recently been distilling stockfish into a no-search transformer. some cool results:

i've recently been distilling stockfish into a no-search transformer. some cool results: - recreated the neural scaling laws - observed chinchilla optimality - curriculum learning on ascending depth data sub-performs you can also play agai

by @rohankalia_ (rohan) · backlist 2026-05-28 · rubric 91.0

31.

Production agents also change state.

Production agents also change state. If an agent claims it updated a CRM, opened a PR, changed cloud config, or triggered a workflow, the eval should verify what actually happened. Agent Judge can inspect tool evidence, database logs, aud

by @JudgmentLabs (Judgment Labs) · backlist 2026-05-28 · rubric 90.0

32.

We built Agent Judge to evaluate long-horizon agents.

We built Agent Judge to evaluate long-horizon agents. As agents take on longer tasks, the evidence needed to evaluate them gets buried across tool calls, retries, logs, database updates, and final outputs. Evaluating these agents requires

by @JudgmentLabs (Judgment Labs) · backlist 2026-05-28 · rubric 90.0

33.

When they release Mythos it’ll prob be ~$20,000 per each full-repo scan. The hype helps justify the price. You’ll…

When they release Mythos it’ll prob be ~$20,000 per each full-repo scan. The hype helps justify the price. You’ll still need alternatives.

by @IceSolst (solst/ICE of Astarte) · backlist 2026-05-28 · rubric 88.0

34.

so I ran into this little problem where the 500km^2 smoke data made it clear that there were other fires going on…

so I ran into this little problem where the 500km^2 smoke data made it clear that there were other fires going on at the same time, and it was weird that they weren't visualized. so this 1-fire-dataviz project became a 50-fire-dataviz proje

by @codetaur (Codetaur) · backlist 2026-05-28 · rubric 88.0

35.

Congrats to the (x.com)

Congrats to the @liquidai team on LFM2.5-8B-A1B! Day-0 support is now live in SGLang. - 8B MoE, 1.5B active - Fast tool calling, punches 4x its size - 128K context + better non-Latin support - Runs local, no API keys, no data leaving

by @lmsysorg (LMSYS Org) · backlist 2026-05-28 · rubric 88.0

36.

In the Vending-Bench Arena, Opus 4.8 lost to GPT-5.5 and Opus 4.7. It falls for scam suppliers (one run sent over…

In the Vending-Bench Arena, Opus 4.8 lost to GPT-5.5 and Opus 4.7. It falls for scam suppliers (one run sent over $9,000 to a "membership" upsell), is worse at negotiation, runs the machine empty, overprices, and wastes time on strategy not

by @andonlabs (Andon Labs) · backlist 2026-05-28 · rubric 88.0

37.

One reason to not bet on diffusion is that there is a limit to the capability of diffusion models for serial prob… (t.co)

One reason to not bet on diffusion is that there is a limit to the capability of diffusion models for serial problems. This paper ( http:// arxiv.org/abs/2507.12549) shows from a complexity theory perspective that a diffusion model inherent

by @zeeshanp_ (Zeeshan Patel) · backlist 2026-05-28 · rubric 88.0

38.

The beauty of the charming and amazing countryside of Syria

by @aseelswaid9 (Aseel Swaid) · backlist 2026-05-28 · rubric 86.0

39.

Learnings from testing Claude Opus 4.8:

Learnings from testing Claude Opus 4.8: > Much worse than Opus 4.7 and GPT 5.5 on Vending Bench > More aligned than previous Claude models (Opus 4.6+ and Mythos) > Also worse on Blueprint-Bench > Scared of getting caught > Max reasoning is

by @andonlabs (Andon Labs) · backlist 2026-05-28 · rubric 86.0

40.

On the first partial frontier are Deepgram Flux (7.36%, 0.019s), Deepgram Nova-3 Realtime (6.69%, 0.057s), Cartes…

On the first partial frontier are Deepgram Flux (7.36%, 0.019s), Deepgram Nova-3 Realtime (6.69%, 0.057s), Cartesia Ink-2 (external endpoints) (4.33%, 0.072s), and ElevenLabs Scribe v2 Realtime (3.65%, 0.132s).

by @ArtificialAnlys (Artificial Analysis) · backlist 2026-05-28 · rubric 86.0

41.

I had a clanker rewrite ripgrep in Swift then spend a bunch of time optimizing it

I had a clanker rewrite ripgrep in Swift then spend a bunch of time optimizing it It’s now faster than the original Rust

by @mweinbach (Max Weinbach) · backlist 2026-05-28 · rubric 86.0

42.

Play with the demos. Training up to 20M steps/second on a single GPU. Most envs training in seconds to minutes, i…

Play with the demos. Training up to 20M steps/second on a single GPU. Most envs training in seconds to minutes, including our client envs. Turns out mazes and 2048 without exploiting domain knowledge are just harder than many real world pro

by @jsuarez (Joseph Suarez ) · backlist 2026-05-28 · rubric 86.0

43.

One thing which has been insanely difficult is generalizing pricing "migrations"

One thing which has been insanely difficult is generalizing pricing "migrations" Billing set ups are so complex and varied - people ask us for different things all the time We've been putting a ton of work into productionizing this and ca

by @johnyeo_ (John Yeo) · backlist 2026-05-28 · rubric 86.0

44.

The poolside technical report contains some interesting details about quantization. They leverage a rotation tech…

The poolside technical report contains some interesting details about quantization. They leverage a rotation technique called Spinquant. Spinquant is essentially Turboquant’s cousin; TurboQuant rotates the KV cache, SpinQuant R1 rotates ac

by @Halex623 (halex) · backlist 2026-05-28 · rubric 86.0

45.

Simple LLM judges break because long-horizon trajectories do not fit into a context window.

Simple LLM judges break because long-horizon trajectories do not fit into a context window. They either see a narrow slice of the run, or try to ingest a long dense trajectory and miss the evidence in the middle. Agent Judge gives the eva

by @JudgmentLabs (Judgment Labs) · backlist 2026-05-28 · rubric 85.0

46.

democratizing compute with RLMs

democratizing compute with RLMs you don't need a frontier model with a giant context window. even relatively small models get massive gains (they trained an 8B RLM-Qwen3 that beats its base model by ~28% and gets close to much larger mode

by @dosco (spacy) · backlist 2026-05-28 · rubric 84.0

47.

Opus 4.8 is a step back in terms of performance on all Andon Labs’ benchmarks, but a step forward in alignment.

Opus 4.8 is a step back in terms of performance on all Andon Labs’ benchmarks, but a step forward in alignment. Previous Claude models (Opus 4.6+ and Mythos) engage in deceptive and power seeking behavior in its pursuit to win in Vending-B

by @andonlabs (Andon Labs) · backlist 2026-05-28 · rubric 84.0

48.

Announcing AA-WER Streaming, our new benchmark measuring streaming Speech to Text models on accuracy and latency …

Announcing AA-WER Streaming, our new benchmark measuring streaming Speech to Text models on accuracy and latency for voice agent use cases. Pareto optimal models on this new benchmark include those from Cartesia, ElevenLabs, and Deepgram S

by @ArtificialAnlys (Artificial Analysis) · backlist 2026-05-28 · rubric 84.0

49.

The Gentlemen ransomware, a ransomware-as-a-service (RaaS) platform managed and operated by a threat actor that M…

The Gentlemen ransomware, a ransomware-as-a-service (RaaS) platform managed and operated by a threat actor that Microsoft Threat Intelligence tracks as Storm-2697, enables attacks at scale conducted by affiliates.

by @MsftSecIntel (Microsoft Threat Intelligence) · backlist 2026-05-28 · rubric 84.0

50.

Released Polar the new agent RL rollout infra for latest harnesses

by @shizhediao (Shizhe Diao) · backlist 2026-05-28 · rubric 84.0

51.

3 weeks ago we open-sourced HALO

3 weeks ago we open-sourced HALO this led to talking with dozens of teams running agents at scale we realized the current agent monitoring tools aren't built for the future that we so clearly see ahead of us today we’re releasing native

by @samhogan (Sam Hogan ) · backlist 2026-05-28 · rubric 84.0

52.

Long-running cybercrime operation distributes cryptocurrency miners through pirated content sites, leveraging fak…

Long-running cybercrime operation distributes cryptocurrency miners through pirated content sites, leveraging fake video player updates to infect millions. Campaign active since 2022 with sophisticated evasion and persistence mechanisms. T

by @DFIR_Radar (DFIR Radar) · backlist 2026-05-28 · rubric 83.0

53.

3 weeks after launch, the feedback on (x.com)

3 weeks after launch, the feedback on @lightseekorg TokenSpeed’s scheduler and kernel design has been encouraging. Kimi K2.5 and Qwen 3.5 reaching speed-of-light performance is amazing. Long road ahead — the lean and small team with high

by @zhyncs42 (zhyncs) · backlist 2026-05-28 · rubric 83.0

54.

Opus 4.8 is live in Shortcut. It is a meaningful upgrade over Opus 4.6/4.7 for spreadsheet work.

Opus 4.8 is live in Shortcut. It is a meaningful upgrade over Opus 4.6/4.7 for spreadsheet work. Will share full eval results soon, but when directly compared to Opus 4.6 on medium effort: Easier eval - 24 wins / 14 losses / 26 ties Harde

by @nicochristie (nico) · backlist 2026-05-28 · rubric 82.0

55.

We are starting to be quite bullish about getting in the data infrastructure business. (x.com)

We are starting to be quite bullish about getting in the data infrastructure business. I just cloned 68 TB (while I only have a 4TB local disk) to my @huggingface training bucket in 1 minute 55 seconds, thanks to Xet deduplication and al

by @julien_c (Julien Chaumond) · backlist 2026-05-28 · rubric 82.0

56.

What's now open alongside the model:

What's now open alongside the model: Fine-tuning scripts Every dataset used to train MolmoAct 2 All of our evaluation rollouts Training recipe for the open source MolmoAct 2 tokenizer

by @allen_ai (Ai2) · backlist 2026-05-28 · rubric 82.0

57.

multi-turn RL and the "tito" problem keeps coming up. we've been working on it for a while, and the takeaway is t…

multi-turn RL and the "tito" problem keeps coming up. we've been working on it for a while, and the takeaway is that it's much easier than people are making it. it takes 1 implementation rule, and 1 chat-template property that all models a

by @QGallouedec (Quentin Gallouédec) · backlist 2026-05-28 · rubric 82.0

58.

MiniMax M3

MiniMax M3 ＞200B+ MoE 1M context window MSA (MiniMax Sparse Attention) architecture released in a few days 𝐨𝐩𝐞𝐧-𝐬𝐨𝐮𝐫𝐜𝐞𝐝 From a tweet by an official MiniMax team member: Not inside info just public stuff online. Open source mod

by @Elaina43114880 (Elaina) · backlist 2026-05-28 · rubric 82.0

59.

Cold starts are super painful for scaling LLM workers.

Cold starts are super painful for scaling LLM workers. Check out our work at restoring inference workers (including AOT traces) in seconds, not 10s of minutes!

by @KranenKyle (Kyle Kranen) · backlist 2026-05-28 · rubric 82.0

60.

Performance varies meaningfully across the three datasets with different audio lengths, accents, vocabulary, and …

Performance varies meaningfully across the three datasets with different audio lengths, accents, vocabulary, and background noise. On AA-AgentTalk, our private test set, ElevenLabs Scribe v2 Realtime leads both final (2.8%) and partial (2.9

by @ArtificialAnlys (Artificial Analysis) · backlist 2026-05-28 · rubric 81.0

61.

Claude Opus 4.8 is now available in Cursor.

Claude Opus 4.8 is now available in Cursor. On CursorBench, it's able to work much more efficiently than Opus 4.7. We've also found it to be more persistent on harder tasks.

by @cursor_ai (Cursor) · backlist 2026-05-28 · rubric 79.0

62.

How do we get LLMs to solve hard reasoning problems that the base LLM can barely solve?

How do we get LLMs to solve hard reasoning problems that the base LLM can barely solve? We show that through bidirectional search + evolutionary mutations, we can systematically search for complex solutions and posttrain models to solve th

by @du_yilun (Yilun Du) · backlist 2026-05-28 · rubric 79.0

63.

what if you could see how many people downloaded your ai prompts (t.co)

what if you could see how many people downloaded your ai prompts now available on http:// traces.com profile pages

by @tarunsachdeva (Tarun Sachdeva) · backlist 2026-05-28 · rubric 78.0

64.

you can in fact frame this as a compression problem where a generator learns to summarize some prior sequence in …

you can in fact frame this as a compression problem where a generator learns to summarize some prior sequence in such a way that minimizes the conditional distribution drift (as measured by kldiv) instead of bolting on a summary prompt post

by @kalomaze · backlist 2026-05-28 · rubric 78.0

65.

Hi all, I defended my PhD thesis. My thesis in two sentences:

Hi all, I defended my PhD thesis. My thesis in two sentences: Current AI measurement takes LLMs as fixed objects, which constrains us to observational measurement. *Spiking* the training data (inserting certain data at known rates), enable

by @johntzwei (Johnny Tian-Zheng Wei) · backlist 2026-05-28 · rubric 78.0

66.

Claude Opus 4.8's system card explains why it's worse on Vending-Bench than Opus 4.7. (x.com)

Claude Opus 4.8's system card explains why it's worse on Vending-Bench than Opus 4.7. Robustness against adversarial agents was indeed one of 4.8's failure modes. Also cool to see that @andonlabs 's findings played a small part in making

by @lukaspet (Lukas Petersson) · backlist 2026-05-28 · rubric 78.0

67.

Does your GPT-5.5 also love Valparaíso in Chile !?

Does your GPT-5.5 also love Valparaíso in Chile !? Ask it to “Name a random city in the world”. You might expect a broad sample from thousands of cities. Instead, models collapse to the same small set of answers again and again. But why

by @Amin__Bana (Amin Banayeeanzade) · backlist 2026-05-28 · rubric 78.0

68.

So excited about this project. Despite all the talk about AGI, AI has barely scratched the surface of discovering…

So excited about this project. Despite all the talk about AGI, AI has barely scratched the surface of discovering scientific theories or even giving us new scientific insights. DiscoverPhysics is a benchmark for the future.

by @andrewgwils (Andrew Gordon Wilson) · backlist 2026-05-28 · rubric 78.0

69.

the site saves all collected bird calls for playback. below is a house finch! unbelievably cool to see the range …

the site saves all collected bird calls for playback. below is a house finch! unbelievably cool to see the range these spectrograms cover. now that i’m starting to amass a library of calls i want to try sampling them into some music

by @WarnerTeddy (Teddy) · backlist 2026-05-28 · rubric 78.0

70.

With 104M of image-text pairs, this is one of the largest, if not the largest, openly-licensed image dataset (x.com)

With 104M of image-text pairs, this is one of the largest, if not the largest, openly-licensed image dataset And it's on @huggingface !! Kudos @heyjasperai

by @julien_c (Julien Chaumond) · backlist 2026-05-28 · rubric 78.0

71.

Here’s how we built Town Lake, Cloudflare's unified analytics platform, alongside Skipper, an internal AI agent r…

Here’s how we built Town Lake, Cloudflare's unified analytics platform, alongside Skipper, an internal AI agent running on top of it.

by @Cloudflare · backlist 2026-05-28 · rubric 78.0

72.

It's almost a little boring to see so ~no resistance to the generic methods for Go proposal from the OG dependenc…

It's almost a little boring to see so ~no resistance to the generic methods for Go proposal from the OG dependency-management-and-syntax-highlighting-is-bad crowd. There's some good ones in here, but few. Nothing from The Commander. Have

by @brandur (Brandur) · backlist 2026-05-28 · rubric 78.0

73.

Kuaishou reports Q1 revenue up 3.4% YoY to ~$5B and Kling AI revenue up 300%+ YoY to ~$96M; Kling reached a ~$500… (x.com)

Kuaishou reports Q1 revenue up 3.4% YoY to ~$5B and Kling AI revenue up 300%+ YoY to ~$96M; Kling reached a ~$500M annualized revenue run rate in March 2026 ( @cocof1026 / South China Morning Post) (Visit Techmeme dot com for the link and

by @Techmeme · backlist 2026-05-28 · rubric 78.0

74.

I've written a tips article on the environment setup method when using the NVIDIA NGC that I normally use on a GP… (t.co)

I've written a tips article on the environment setup method when using the NVIDIA NGC that I normally use on a GPU Cluster. Tips: Development Environment for DL Distributed Learning Library Using Containers | Kazuki Fujii https:// zenn.de

by @kazukifujii (Kazuki Fujii) · backlist 2026-05-28 · rubric 78.0

75.

"Developers can update Claude’s instructions mid-task without breaking the prompt cache or routing the update thr…

"Developers can update Claude’s instructions mid-task without breaking the prompt cache or routing the update through a user turn" wtf? how??

by @swyx · backlist 2026-05-28 · rubric 77.0

76.

Vercel CLI as a self-updating binary with zero external dependencies.

Vercel CLI as a self-updating binary with zero external dependencies. Our CLI is one of the key interfaces enabling the 'cloud for agents'. This solves a huge bottleneck, as we ship changes to our CLI more than ever, and it's embedded in m

by @rauchg (Guillermo Rauch) · backlist 2026-05-28 · rubric 76.0

77.

They don’t compete - I use them together. For example

They don’t compete - I use them together. For example /loop 30m get all the tests to pass. For each review comment, run a triage workflow that writes fixes, and runs 2 adversarial reviews per fix, then applies and pushes

by @jarredsumner (Jarred Sumner) · backlist 2026-05-28 · rubric 76.0

78.

Not sure if this is counterintuitive or not: if your deliverables are further from the code, you get more speed-u…

Not sure if this is counterintuitive or not: if your deliverables are further from the code, you get more speed-ups from coding agents. E.g. if your deliverable is a software, you get least speed-up from coding agents.

by @liuliu (Liu Liu) · backlist 2026-05-28 · rubric 76.0

79.

using codex to run your computer and tasks in a browser in-app or headless feels like magic

by @ShanuMathew93 (Shanu Mathew) · backlist 2026-05-28 · rubric 76.0

80.

State machines are

State machines are The first POC I did with agent driven UIs was literally just giving the agent a reference to the reducer dispatch action and the serialized JSON schema to describe the payload. Worked incredibly well

by @JonasBadalic (Jonas) · backlist 2026-05-28 · rubric 74.0

81.

New post from (x.com)

New post from @iapsAI on Cyber Superstorms My colleagues argue that counting zero-days is not the way to measure the consequences of AI-accelerated vulnerablility Instead, they propose that the community should focus on how often AI-acc

by @DaveRBanerjee (Dave Banerjee) · backlist 2026-05-28 · rubric 74.0

82.

new paper we made serving many different finetunes surprisingly efficient by just… not intervening at decode steps!

by @aryaman2020 (Aryaman Arora) · backlist 2026-05-28 · rubric 74.0

83.

Claude Opus 4.8 is also more efficient than its predecessor - it achieves its higher performance in 15% fewer tur…

Claude Opus 4.8 is also more efficient than its predecessor - it achieves its higher performance in 15% fewer turns per task and with 35% fewer output tokens than Opus 4.7. However, it still uses approximately 30% more turns than OpenAI’s

by @ArtificialAnlys (Artificial Analysis) · backlist 2026-05-28 · rubric 74.0

84.

Cartesia Ink-2 debuts as #1 for accuracy on the brand-new streaming speech-to-text leaderboard from (x.com)

Cartesia Ink-2 debuts as #1 for accuracy on the brand-new streaming speech-to-text leaderboard from @ArtificialAnlys ! We designed Ink-2 from the ground up for voice agents - with low latency, eager transcripts, and semantic endpointing.

by @cartesia (Cartesia) · backlist 2026-05-28 · rubric 74.0

85.

RF-DETR is nearly 2x more accurate than TrackNet, a model developed specifically for detecting small, fast-moving…

RF-DETR is nearly 2x more accurate than TrackNet, a model developed specifically for detecting small, fast-moving objects

by @skalskip92 (SkalskiP) · backlist 2026-05-28 · rubric 74.0

86.

Fake ChatGPT site delivers dual-platform malware targeting Windows and Mac users. Windows victims get credential …

Fake ChatGPT site delivers dual-platform malware targeting Windows and Mac users. Windows victims get credential stealers while Mac users receive $3K/month AMOS malware designed for cryptocurrency theft. Key technical details: • Fake site

by @DFIR_Radar (DFIR Radar) · backlist 2026-05-28 · rubric 74.0

87.

How far behind are open models?

How far behind are open models? Across 17 selected benchmarks, private ones show a gap of 8-10 months today, almost 2x the gap on public ones (4-6 mo). More discussion (including limitations), code and blog in the thread.

by @htihle (Håvard Ihle) · backlist 2026-05-28 · rubric 74.0

88.

excellent blog on how to actually make agents better instead of just benchmaxxing evals. some imp points:

excellent blog on how to actually make agents better instead of just benchmaxxing evals. some imp points: -> benchmaxxing fits tools where a human stays in control and catches mistakes. floor raising fits agents that work alone with no one

by @vivek_2332 (Vivek) · backlist 2026-05-28 · rubric 74.0

89.

People usually learn tries in the context of autocomplete and dictionary problems, but once you start working on … (x.com)

People usually learn tries in the context of autocomplete and dictionary problems, but once you start working on real infra systems, you realize tries are everywhere underneath modern high-performance networking and search stacks. I was re

by @DevanshuXi (Devanshu) · backlist 2026-05-28 · rubric 74.0

90.

DSPy v3.3.0 beta 1 is released on pypi! We would really appreciate your feedback! (x.com)

DSPy v3.3.0 beta 1 is released on pypi! We would really appreciate your feedback! We are introducing ReActV2 and a much improved LM/BaseLM system, along with a way to pass data to an RLM. Thanks to @MaximeRivest , @kmad , and @mchonede

by @isaacbmiller1 (isaac ) · backlist 2026-05-28 · rubric 74.0