Backlist — 27 May 2026 UTC

1.

Biohub releases ESMFold2, ESMC, and ESM Atlas

Open protein models and billion-scale folded-protein atlases give biologists shared infrastructure for structure prediction, design, and discovery

by @biohub · backlist 2026-05-27 · rubric 70.0

2.

Censys found 12,520 exposed MCP servers (t.co)

Thousands of credentialless MCP endpoints show agent tooling is being deployed faster than it is being secured

by @censysio (Censys) · backlist 2026-05-27 · rubric 84.0

3.

Surya OCR 2: 650M params, 91 languages, CPU/GPU/MPS

A small OCR model reached strong document, table, handwriting, math, and layout performance while remaining practical to run locally

Editor’s note: imported_from_x_likes

by @VikParuchuri (Vik Paruchuri) · backlist 2026-05-27 · rubric 96.0

4.

AxiomProver-generated math papers accepted by peer-reviewed journals

Five AI-assisted papers across algebraic geometry, representation theory, number theory, and combinatorics passed peer review in solid math journals

by @axiommathai (Axiom) · backlist 2026-05-27 · rubric 78.0

5.

Ramp deployed 10,000 background agents to security-scan its codebase

A large parallel agent scan on public models found and fixed several high-severity vulnerabilities in a real codebase

by @RampLabs (Ramp Labs) · backlist 2026-05-27 · rubric 97.0

6.

An open-source 300 PPI e-ink portable monitor (t.co)

Modos Flow shipped as a 60Hz touch e-ink monitor with complete MCU firmware, FPGA code, and KiCAD board files

by @zephray_wenting (Wenting mostly on bsky) · backlist 2026-05-27 · rubric 91.0

7.

Firefox Android sandbox escape disclosed as CVE-2026-8945

A sandbox escape affecting Firefox and Firefox Focus for Android reached public disclosure after Mozilla fixed it in Firefox 151

by @NiaAtSyzd (Nia) · backlist 2026-05-27 · rubric 89.0

8.

Parquet standardizes floating-point ordering and NaN statistics

A years-long Parquet effort finally pinned down how floating points and NaNs should behave across implementations

by @andrewlamb1111 (Andrew Lamb) · backlist 2026-05-27 · rubric 68.0

9.

How Figma made vector editing up to 10x faster (x.com)

Instanced rendering, faster selected-segment drawing, hit-testing changes, and selection-color gathering produced major speedups in a core design tool

by @figma (Figma) · backlist 2026-05-27 · rubric 56.0

10.

Glassworm botnet disrupted after multi-channel C2 takedown

Glassworm targeted developers through poisoned VS Code extensions, npm packages, and GitHub repos while using Solana, BitTorrent, and Google Calendar for resilient command and control

by @DFIR_Radar (DFIR Radar) · backlist 2026-05-27 · rubric 72.0

11.

AminoWeb: 29 cleaned protein datasets totaling 7.5 TB (x.com)

Protein ML now has a FineWeb-like cleaned dataset bundle covering sequence, structure, and related modalities instead of scattered supplementary tables and FTP mirrors

by @try_litefold (LiteFold) · backlist 2026-05-27 · rubric 82.0

12.

OpenRouter raises $113M while routing 100T tokens a month

A neutral routing layer reached massive token volume and venture scale while the market debates whether model labs will own the whole stack

by @Hadley (Hadley Harris) · backlist 2026-05-27 · rubric 84.0

13.

ByteDance reportedly discussing up to $70B in 2026 AI capex

ByteDance is considering an AI infrastructure buildout financed by tens of billions in annual profit, putting private capex on hyperscaler scale

by @Techmeme · backlist 2026-05-27 · rubric 30.0

14.

BYD’s supplier-financing model faces pressure from Beijing (t.co)

BYD’s rise relied partly on billions owed to suppliers as cheap financing, and regulators are now forcing a cleanup of that balance-sheet strategy

by @edwardwhitenz (Edward White) · backlist 2026-05-27 · rubric 45.0

15.

Robinhood lets users link AI agents to dedicated trading accounts (x.com)

A mainstream brokerage is allowing tools like Claude and Cursor to connect to segregated accounts and place autonomous stock trades

by @Techmeme · backlist 2026-05-27 · rubric 84.0

16.

Chrome bug: number inputs can change unexpectedly on wheel events

Chrome’s handling of wheel listeners around number inputs can mutate values unexpectedly, with the fix not landing until Chrome 150

by @jaffathecake (Jake Archibald) · backlist 2026-05-27 · rubric 42.0

17.

METLIN 960K integrated into Mass Analytica MARS (t.co)

Nearly one million authentic-standard MS/MS spectra are now available inside a metabolomics analysis platform as experimental data rather than predictions or crowdsourcing

by @kadzuis (Gary Siuzdak) · backlist 2026-05-27 · rubric 86.0

18.

Verne Van: adapting robots at customer sites with a few hours of data

A field robot adapted to real customer sites using short onsite data collection, attacking the last-mile gap between robotics demos and deployment

by @neil_nie_ (Neil Nie) · backlist 2026-05-27 · rubric 84.0

19.

RF-DETR lands in Hugging Face Transformers (x.com)

A real-time detection and segmentation model family now ships through Transformers with fine-tuning tutorials and webcam demos

by @skalskip92 (SkalskiP) · backlist 2026-05-27 · rubric 88.0

20.

TACHIOM makes late-interaction retrieval much faster

An open-source architecture tackles the k-means bottleneck in late-interaction retrieval with up to 247x faster clustering and 9.8x faster retrieval

by @SilvioMartinico (Silvio Martinico) · backlist 2026-05-27 · rubric 86.0

21.

Chalcedon: memory-efficient Butina clustering for chemical datasets

A chemical-data splitter that once required multi-terabyte-memory machines can now run robustly on consumer hardware

by @manntis4 (Eli Mann) · backlist 2026-05-27 · rubric 93.0

22.

Bypassing ASLR and NX on ARM64 with two cooperating bugs (t.co)

A practical ARM64 walkthrough shows how an information leak and a second bug combine into code execution even though either bug alone is harmless

by @8kSec · backlist 2026-05-27 · rubric 91.0

23.

AI boom strains optical communications supply chains (t.co)

After memory chips and CPUs, AI demand is now disrupting optical communications components that data centers need to scale

by @NikkeiAsia (Nikkei Asia) · backlist 2026-05-27 · rubric 67.0

24.

A game jam constrained to 1.44 MB (t.co)

A development contest asks entrants to fit complete games into the size of a single floppy disk while still allowing modern engines if they fit

by @AUTOMATONJapan (AUTOMATON（オートマトン）) · backlist 2026-05-27 · rubric 14.0

25.

Samsung preferred/common spread hits a record

Samsung common shares trade at a 63% premium to economically similar preferred shares, implying a large valuation gap inside the same company

by @blondesnmoney (Cluseau Investments) · backlist 2026-05-27 · rubric 62.0

26.

Finnish daycare yards got forest dirt; children’s blood tests changed

Researchers replaced gravel with forest soil and grass at daycare yards and saw measurable immune-related changes in children within a month

by @anishmoonka (Anish Moonka) · backlist 2026-05-27 · rubric 6.0

27.

Toward a virtual optical telescope with a 1.5 km aperture (x.com)

Quantum-tech advances are enabling optical interferometry schemes that could synthesize telescope apertures far larger than any single mirror

by @LeeBillings (Lee Billings) · backlist 2026-05-27 · rubric 6.0

28.

Retired electron microscope hand panels used as a game controller

Two old TEM control panels were reverse-engineered into a working game controller, preserving obscure instrument muscle memory in software

by @IMBalENce (Zhou Xu 徐洲) · backlist 2026-05-27 · rubric 72.0

29.

A first Lean project with mathlib weighs 7 GB (t.co)

Installing Lean through VS Code and creating a starter mathlib project produced a seven-gigabyte directory, revealing the cost of modern formal-math tooling

by @arntzenius (rntz) · backlist 2026-05-27 · rubric 62.0

30.

America’s missile bottleneck is solid rocket motors

A great-power conflict could exhaust missile stocks quickly, and simply founding more missile startups does not solve the brittle solid-rocket-motor supply chain

by @kwharrison13 (Kyle Harrison) · backlist 2026-05-27 · rubric 24.0

31.

Found a local privilege escalation on the latest Linux. Reported to the vendor, awaiting CVE. Writeup after the f…

Found a local privilege escalation on the latest Linux. Reported to the vendor, awaiting CVE. Writeup after the fix lands.

by @ikotas00 (株式会社Ikotas Labs) · backlist 2026-05-27 · rubric 96.0

32.

auto: [Android 5.15] KASAN: use-after-free Read in atime_needs_update: Detected use-after-free in atime_needs_upd… (t.co)

auto: [Android 5.15] KASAN: use-after-free Read in atime_needs_update: Detected use-after-free in atime_needs_update function, leading to a read of size 4 in fs/inode.c. Issue found in task syz.1.1270 during unlink syscall. link: https:/

by @0x_shaq (faulty *ptrrr) · backlist 2026-05-27 · rubric 96.0

33.

We evaluated Gandalf, our agentic judge, on a new meta-evaluation dataset called BankerVerifierBench (BVB), built…

We evaluated Gandalf, our agentic judge, on a new meta-evaluation dataset called BankerVerifierBench (BVB), built on top of BankerToolBench (BTB), a long-time-horizon investment-banking benchmark. Gandalf achieves the highest performance an

by @anishathalye (Anish Athalye) · backlist 2026-05-27 · rubric 95.0

34.

Built a speculative decoding inference engine in Triton.

Built a speculative decoding inference engine in Triton. (more tests still ongoing) For now, I've tested with the GPT family because thats what i can run on my personal GPU (4gb) and I'm outperforming SGLang on both throughput and correctn

by @mohitwt_ (mohit) · backlist 2026-05-27 · rubric 94.0

35.

The unlock was a self-improvement loop.

The unlock was a self-improvement loop. We record production misses: unsupported fields, wrong predictions, and corrections. Codex then uses that context to autonomously create evals from production data, hillclimb against them, and open

by @samaysham (Samay) · backlist 2026-05-27 · rubric 94.0

36.

A single Meta engineer burned roughly $500K/month in Token consumption (about 300 billion tokens / month) on the …

A single Meta engineer burned roughly $500K/month in Token consumption (about 300 billion tokens / month) on the company's internal "Claudeonomics" leaderboard that ranked employees by Token usage. The leaderboard ran from March, employee

by @sheriyuo (Xiuyu Li) · backlist 2026-05-27 · rubric 94.0

37.

with the Lightcone API you can run 50 fast computer use agents in parallel, each on its own machine. model and in…

with the Lightcone API you can run 50 fast computer use agents in parallel, each on its own machine. model and infra via one API. we built a demo that summarizes news and maps it.

by @tzafon_company (Tzafon) · backlist 2026-05-27 · rubric 92.0

38.

The speed-of-light optimization for Qwen3.5 on the TokenSpeed inference engine is a significant milestone, achiev…

The speed-of-light optimization for Qwen3.5 on the TokenSpeed inference engine is a significant milestone, achieving a record-breaking 580 tokens per second (tps) for agentic workloads on NVIDIA GPUs. In the PyTorch Foundation's latest com

by @PyTorch · backlist 2026-05-27 · rubric 92.0

39.

There are so many sandbox providers out there.

There are so many sandbox providers out there. Why is there no first-class integration into Claude Code, Codex, Opencode, etc? I just want to spin up 10 agents with the full ability to run my local code independently and communicate with

by @damian_b (Damian Barabonkov) · backlist 2026-05-27 · rubric 92.0

40.

Why does deep learning generalize? What does weight decay really do? Can algorithmic information theory address t…

Why does deep learning generalize? What does weight decay really do? Can algorithmic information theory address these questions? In my latest preprint, I give a proof that the minimum neural weight norm matches the minimum program length (

by @Tiberiu_Musat_ (Tiberiu Mușat) · backlist 2026-05-27 · rubric 92.0

41.

Laguna M.1/XS.2 tech report from Poolside has lots of details on their infrastructure!

by @JordanNanos (Jordan Nanos) · backlist 2026-05-27 · rubric 92.0

42.

slime was built for agentic RL from day 0.

slime was built for agentic RL from day 0. We added an Agentic RL Training Roadmap that brings together the pieces already in slime for agent workflows: custom generation, verifier/test-based rewards, fan-out samples, async rollout, SGLang

by @slime_framework (slime) · backlist 2026-05-27 · rubric 92.0

43.

So awesome, Liang Sheng! I burned through 460 million tokens in a single day, and it only cost me about 30 RMB

So awesome, Liang Sheng! I burned through 460 million tokens in a single day, and it only cost me about 30 RMB If this were Opus or something, it might easily run up 700 bucks or more, who knows, but the actual performance here is pretty

by @silsrc (scr.c) · backlist 2026-05-27 · rubric 92.0

44.

MiniMax-M2 paper just dropped

MiniMax-M2 paper just dropped The key focus of M2 is on something more agent-native. It trains on runnable workspaces and artifact-grounded rewards, then uses Forge to scale RL over long coding, app, search, and office-task trajectories.

by @askalphaxiv (alphaXiv) · backlist 2026-05-27 · rubric 92.0

45.

A dangerous Windows Kernel EoP vulnerability allows browser sandbox escapes. Public PoC exploit code is available… (t.co)

A dangerous Windows Kernel EoP vulnerability allows browser sandbox escapes. Public PoC exploit code is available on GitHub. #Windows11 #Infosec #KernelExploit #CVE202640369 https:// securityonline.info/windows-kernel -eop-vulnerability-p

by @the_yellow_fall (Gray Hats) · backlist 2026-05-27 · rubric 92.0

46.

Things that DeepSWE does well on (for long horizon benchs out there):

Things that DeepSWE does well on (for long horizon benchs out there): 0.3% false-positive vs SWE-Bench Pro's 8.5%, with an independent LLM-analyzer audit on every trial pretty good contamination resistance as seen from canary GUID, fairly

by @SeanZCai (Sean Cai) · backlist 2026-05-27 · rubric 92.0

47.

1/ Today we're releasing AttuneBench, the first open EQ benchmark grounded in real multi-turn human-model convers… (x.com)

1/ Today we're releasing AttuneBench, the first open EQ benchmark grounded in real multi-turn human-model conversations, scored against what the person actually felt and wanted at each turn. Built by the research team at @pareto_ai in co

by @phoebeyao (Phoebe Yao) · backlist 2026-05-27 · rubric 91.0

48.

Grading agent rollouts in rubric-graded RL environments is itself a hard task.

Grading agent rollouts in rubric-graded RL environments is itself a hard task. Prior approaches pass serialized artifacts or agent trajectories to an LLM judge; this loses information / doesn't support sophisticated criteria. In contrast,

by @anishathalye (Anish Athalye) · backlist 2026-05-27 · rubric 91.0

49.

DwarfStar prefill is a hell faster after merging (t.co)

DwarfStar prefill is a hell faster after merging https:// github.com/antirez/ds4/pu ll/264 …, I need to update the README benchmarks as they no longer mean most :D after +40% prefill speed boost.

by @antirez · backlist 2026-05-27 · rubric 91.0

50.

SGLang v0.5.12.post1 is live

SGLang v0.5.12.post1 is live This is a stability patch on top of v0.5.12, with 12 cherry-picks focused on DeepSeek V4, NIXL PD disaggregation, and Blackwell. DeepSeek V4 Fixed V4-Pro garbled text on single-token decode (B200/B300) Fixed

by @lmsysorg (LMSYS Org) · backlist 2026-05-27 · rubric 91.0

51.

EAGLE 3.1 is out. The team identified attention drift as the root cause of acceptance-length degradation at deepe…

EAGLE 3.1 is out. The team identified attention drift as the root cause of acceptance-length degradation at deeper speculation steps. Fix: FC normalization + post-norm hidden-state feedback. Result: 2x longer acceptance length in long-cont

by @RedHat_AI (Red Hat AI) · backlist 2026-05-27 · rubric 91.0

52.

shipped (x.com)

shipped @getbuzzr /dfs-engine 4.0 today — strict settlement contracts, typed invariant errors, hardened payout math for PrizePicks/Underdog-style DFS grading. 4 new companions also at 1.0: dfs-cli, dfs-react, dfs-provider-sportradar, dfs

by @sarveshsea · backlist 2026-05-27 · rubric 91.0

53.

rtk: a Rust CLI proxy that cuts Claude Code token usage by 60–90%.

Editor’s note: imported_from_x_likes

rtk: a Rust CLI proxy that cuts Claude Code token usage by 60–90%. It filters output from git, tests, lint, kubectl etc. before it hits the LLM context. A pre-bash hook reroutes commands like git status through rtk to strip redundant info.

by @sdusteric (Eric Lee) · backlist 2026-05-27 · rubric 90.0

54.

we just shipped sandbox-sdk v0.10.2 today

Editor’s note: imported_from_x_likes

we just shipped sandbox-sdk v0.10.2 today - cloudflare tunnels support - mount R2 buckets directly from worker bindings - isolated exec() calls small release, but a lot of quality of life improvements for agents running in containers

by @whoiskatrin (kate) · backlist 2026-05-27 · rubric 90.0

55.

New paper on activation mixing. The authors evaluate several mixing strategies across both classical FFNs and Swi… (t.co)

New paper on activation mixing. The authors evaluate several mixing strategies across both classical FFNs and SwiGLU FFNs, with ablations on dense and MoE models. One interesting result: the most expressive mixing strategy isn’t the best ch

by @f14bertolotti (Francesco Bertolotti) · backlist 2026-05-27 · rubric 90.0

56.

Can current code agents survive beyond single-repo bug fixing? (t.co)

Can current code agents survive beyond single-repo bug fixing? BeyondSWE: 500 real-world tasks from 246 GitHub repos, covering cross-repo issues, domain-specific fixes, dependency migration, and doc-to-repo generation. https:// arxiv.or

by @RUC_AIBox (AI Box) · backlist 2026-05-27 · rubric 90.0

57.

ECHO paper + code are now live!

ECHO paper + code are now live! We open-sourced a small SkyRL-based implementation of "world loss" for terminal-agent RL. GRPO trains on what the agent did. ECHO also learns from what the terminal said next. Same rollout. Same policy fo

by @VaishShrivas (Vaish Shrivastava) · backlist 2026-05-27 · rubric 88.0

58.

Gandalf code: (t.co)

Gandalf code: https:// github.com/Handshake-AI-R esearch/gandalf-the-grader … Blog post with details: https:// joinhandshake.com/research/ai/ga ndalf-the-grader/ …

by @anishathalye (Anish Athalye) · backlist 2026-05-27 · rubric 88.0

59.

New: grep for exact matching

New: grep for exact matching grep → keyword / regex matching search → fine-grained semantic retrieval Works across uploaded content, including text, PDFs (OCR) and audio/video (transcription). Give your agents both retrieval primitives t

by @mixedbreadai (Mixedbread) · backlist 2026-05-27 · rubric 88.0

60.

Tasks require agents to investigate Kubernetes incident snapshots through shell commands and submit a structured …

Tasks require agents to investigate Kubernetes incident snapshots through shell commands and submit a structured JSON diagnosis identifying the responsible root-cause entities. In one public SRE task, the agent sees user-facing failures in

by @ArtificialAnlys (Artificial Analysis) · backlist 2026-05-27 · rubric 88.0

61.

The bottleneck in LLM inference isn't compute. It's how fast you can move the weights. (x.com)

The bottleneck in LLM inference isn't compute. It's how fast you can move the weights. Our CTO Mathias Lechner, @mlech26l , joins Piotr Mazurek, @tugot17 , from our inference team, to discuss what actually limits token throughput and how

by @liquidai (Liquid AI) · backlist 2026-05-27 · rubric 88.0

62.

The MiniMax M2 series was one of the most widely used open-weight LLM series earlier this year. Now, we got a tec…

The MiniMax M2 series was one of the most widely used open-weight LLM series earlier this year. Now, we got a technical report with some interesting tidbits. I summarized some of them below: 1. Full attention as an anti-trend?: They tried

by @rasbt (Sebastian Raschka) · backlist 2026-05-27 · rubric 88.0

63.

Most AI products still reset after deployment.

Most AI products still reset after deployment. Trajectory is building something more interesting: AI systems that continuously learn from real usage. Every correction, retry, and edit becomes training signal instead of wasted data. They’

by @TechByMarkandey (Markandey Sharma) · backlist 2026-05-27 · rubric 88.0

64.

Early on, Tax AI handled simpler returns. By season’s end, it processed K-1s, rentals, LLCs, deductions, and more.

Early on, Tax AI handled simpler returns. By season’s end, it processed K-1s, rentals, LLCs, deductions, and more. At launch, ~25% of returns hit 75%+ field completion. Six weeks later: 86%. Now it drafts returns with up to 97% accuracy,

by @samaysham (Samay) · backlist 2026-05-27 · rubric 88.0

65.

Behind the build of self-improving tax agents with Codex (x.com)

Behind the build of self-improving tax agents with Codex We co-built Tax AI with @ThriveHoldings around tax prep workflows so when reviewers fix any errors, Codex can trace the failure, improve the system, and test the change before it

by @OpenAIDevs (OpenAI Developers) · backlist 2026-05-27 · rubric 88.0

66.

Introducing a minimal training harness built on prime-rl and verifiers, so you can now train your own RLMs withou…

Introducing a minimal training harness built on prime-rl and verifiers, so you can now train your own RLMs without sandboxes! All available in the `training/` folder in the RLM GitHub repo! We train RLM-Qwen3-30B-A3B-v0.1, using RL on a se

by @a1zhang (alex zhang) · backlist 2026-05-27 · rubric 88.0

67.

Yesterday I received an email to notify me of a case that looked like a malicious Google sponsored ad result. I t…

Yesterday I received an email to notify me of a case that looked like a malicious Google sponsored ad result. I tried to make sense of it, unraveling some obfuscated JavaScript, then stages of Batch and PowerShell (with some interesting cod

by @_JohnHammond (John Hammond) · backlist 2026-05-27 · rubric 88.0

68.

Per (x.com)

Per @EpochAIResearch , the world's Blackwell GPUs can produce roughly 500M–20B output tokens per second today, depending on context length. Inference capacity is growing 3.4x/year. Token demand is growing 10x/year. Long-context workloads

by @ShanuMathew93 (Shanu Mathew) · backlist 2026-05-27 · rubric 88.0

69.

Everything you always wanted to know about Transformers.js, in one video.

Everything you always wanted to know about Transformers.js, in one video. I made a deep dive into how AI models run from JavaScript: tensors, ONNX, quantization, `pipeline()`, WebGPU/WASM, preprocessing, postprocessing, and what happens un

by @nicodotdev ( Nico Martin) · backlist 2026-05-27 · rubric 88.0

70.

Someone debugged for half a day, only to find their RL was forever stuck at

Someone debugged for half a day, only to find their RL was forever stuck at (EntropyTaskRunner pid=x) self.use_critic = need_critic(self.config) Turns out this pig very thoughtfully reused the same submit_task.sh, allocating a full 16

by @sheriyuo (Xiuyu Li) · backlist 2026-05-27 · rubric 88.0

71.

And poof! just like that. all that obscurity to hide feature extraction/heuristic logic/verdict weights means fuc… (t.co)

And poof! just like that. all that obscurity to hide feature extraction/heuristic logic/verdict weights means fuck all now, and Im so happy those prickly vendors. https:// trustedsec.com/blog/the-defen sive-stack-is-exposed … - @HackingLZ

by @simplylurking2 (wallfacer) · backlist 2026-05-27 · rubric 88.0

72.

Figuring out how to benchmark agents on realistic biology research has quickly become one of my favorite types of…

Figuring out how to benchmark agents on realistic biology research has quickly become one of my favorite types of engineering work. You work with scientists to get to the core of some biological claim, precisely assembling raw data/prior li

by @kenbwork (Kenny Workman) · backlist 2026-05-27 · rubric 88.0

73.

HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLMs in Sponsored Search Retrieval

HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLMs in Sponsored Search Retrieval Microsoft introduces a recipe to distill large SLM retrievers into compact query encoders for Bing Ads.

by @_reachsumit (Sumit) · backlist 2026-05-27 · rubric 88.0

74.

AutoResearch AI This paper is definitely worth reading.

AutoResearch AI This paper is definitely worth reading. It's not about the single-point capability of "AI helping you summarize papers," but a bigger trend: research is moving from task-level AI to workflow-level AI. In other words, AI in

by @Xudong07452910 (Xudong Han) · backlist 2026-05-27 · rubric 88.0

75.

Tired of benchmarking your optimizer on Hartmann and Branin? Try BoLT , our new black-box optimization (BBO) benc…

Tired of benchmarking your optimizer on Hartmann and Branin? Try BoLT , our new black-box optimization (BBO) benchmark grounded in 20K+ real LLM experiments instead! LLMs involve expensive, derivative-free decisions that BBO is built to h

by @bryanklow (Bryan Kian Hsiang Low) · backlist 2026-05-27 · rubric 88.0

76.

Unbelievable that I built the fastest, most complete MP4 parser in the world and just keep it in a private repo

Unbelievable that I built the fastest, most complete MP4 parser in the world and just keep it in a private repo Haven't worked on it lately but it's in a great state It has: - io_uring - WASM - strict ISO mode - 100% required boxes impl

by @wavefnx · backlist 2026-05-27 · rubric 86.0

77.

brooo trust me kimi is like gpt-5.5 but faster and cheaper, just let me add one more gpu to my local cluster bro,…

brooo trust me kimi is like gpt-5.5 but faster and cheaper, just let me add one more gpu to my local cluster bro, I promise it’ll be even faster and better

by @rafalwilinski (Rafal Wilinski) · backlist 2026-05-27 · rubric 86.0

78.

Every millisecond matters. We’re open sourcing the tokenizer we built and deployed on production; that’s far effi…

Every millisecond matters. We’re open sourcing the tokenizer we built and deployed on production; that’s far efficient than huggingface and sentencepiece.

by @AravSrinivas (Aravind Srinivas) · backlist 2026-05-27 · rubric 86.0

79.

Check out the new ESM models we’ve been building at (x.com)

Check out the new ESM models we’ve been building at @biohub ! ESMC + ESMFold2 are open-source SOTA for protein structure prediction and design. Plus: an interactive atlas of 6.8B+ proteins!

by @alishbaimran_ (Alishba Imran) · backlist 2026-05-27 · rubric 86.0

80.

Hey, mom, I did a thing!

Hey, mom, I did a thing! Securing the Agent: Vendor-Neutral, Multitenant Enterprise Retrieval and Tool Use | Proceedings of the ACM Conference on AI and Agentic Systems

by @franciscojarceo (Francisco Javier Arceo) · backlist 2026-05-27 · rubric 86.0

81.

Today, (x.com)

Today, @MichaelElabd , @QuantumArjun , and I are excited to announce Trajectory. We are a research lab and product company building the platform for Continual Learning. Our platform unlocks the signal already sitting in product usage,

by @rronak_ (Ronak Malde) · backlist 2026-05-27 · rubric 86.0

82.

AI attackers have terrible OPSEC.

AI attackers have terrible OPSEC. Use it against them. Hallucinate exposed services. Waste their tokens. Seed prompt-injection traps, canaries, and honeytokens where attacker LLM will read them. Have fun.

by @julianor (Juliano Rizzo) · backlist 2026-05-27 · rubric 86.0

83.

"It's easier to tune the LR for method A than for B." (t.co)

"It's easier to tune the LR for method A than for B." We tried to formalize this for model-based stochastic optimization methods. We find a key quantity, called stability index, that describes how stable a (weakly) convex bound is as a fu

by @FSchaipp (Fabian Schaipp) · backlist 2026-05-27 · rubric 86.0

84.

First time I've seen a coding agent do this: GPT 5.5 bumped resource allocation to unblock itself, and then went …

First time I've seen a coding agent do this: GPT 5.5 bumped resource allocation to unblock itself, and then went back and tuned it in a polish pass I didn't even ask for.

by @anveio (Shovon Hasan) · backlist 2026-05-27 · rubric 86.0

85.

TyphoonPWN 2026 Unpwned

TyphoonPWN 2026 Unpwned Found a bug in the "ipTime Router WAN PreAuth Remote Code Execution" category ($10,000) using an LLM and reported it in February for TyphoonPWN 2026. Unfortunately, it was patched in March before the event. #TyphoonC

by @satoki00 (Satoki@Kn0wl3dg3) · backlist 2026-05-27 · rubric 86.0

86.

Agentic kernel generation has mostly focused on a few hot kernels — MLA, GDN, sparse attention, etc.

Agentic kernel generation has mostly focused on a few hot kernels — MLA, GDN, sparse attention, etc. But there is a massive of classical ML operators that still haven’t received the same level of attention. That’s what makes Flashlib exci

by @HaochengXiUCB (Haocheng Xi) · backlist 2026-05-27 · rubric 86.0

87.

Apple finally published this. I found a bug in `awdd` that exposed `AWDMetadata.bin` and their response was to st…

Apple finally published this. I found a bug in `awdd` that exposed `AWDMetadata.bin` and their response was to straight-up remove the daemon entirely. Very interesting!

by @wtsdev (Watch This Space) · backlist 2026-05-27 · rubric 86.0

88.

[1/5] Works on test set contamination focus on detection, but we show correction of inflated test scores is pos… (t.co)

[1/5] Works on test set contamination focus on detection, but we show *correction* of inflated test scores is possible. https:// arxiv.org/abs/2605.24818 Our proposal is to spike the training data and insert some test examples at known rat

by @johntzwei (Johnny Tian-Zheng Wei) · backlist 2026-05-27 · rubric 86.0

89.

LLMs represent concepts as vectors. Strikingly, taxonomies (organism → animal → bird) appear as hierarchies in em… (x.com)

LLMs represent concepts as vectors. Strikingly, taxonomies (organism → animal → bird) appear as hierarchies in embedding space. Led by my student @AndresNava , we show this comes from co-occurrence statistics alone. http:// arxiv.org/abs/

by @MatthieuWyart (Matthieu wyart) · backlist 2026-05-27 · rubric 85.0

90.

I really appreciate the lessons and technical ideas (x.com)

I really appreciate the lessons and technical ideas @samaysham & team were able to share about their tax agent system, which learns from production traces to self-improve via detailed tracing tightly integrated into deployment + an autono

by @thesephist (Linus) · backlist 2026-05-27 · rubric 84.0

Backlist — 27 May 2026 UTC

Censys found 12,520 exposed MCP servers (t.co)

An open-source 300 PPI e-ink portable monitor (t.co)

How Figma made vector editing up to 10x faster (x.com)

AminoWeb: 29 cleaned protein datasets totaling 7.5 TB (x.com)

BYD’s supplier-financing model faces pressure from Beijing (t.co)

Robinhood lets users link AI agents to dedicated trading accounts (x.com)

METLIN 960K integrated into Mass Analytica MARS (t.co)

RF-DETR lands in Hugging Face Transformers (x.com)

Bypassing ASLR and NX on ARM64 with two cooperating bugs (t.co)

AI boom strains optical communications supply chains (t.co)

A game jam constrained to 1.44 MB (t.co)

Toward a virtual optical telescope with a 1.5 km aperture (x.com)

A first Lean project with mathlib weighs 7 GB (t.co)

auto: [Android 5.15] KASAN: use-after-free Read in atime_needs_update: Detected use-after-free in atime_needs_upd… (t.co)

A dangerous Windows Kernel EoP vulnerability allows browser sandbox escapes. Public PoC exploit code is available… (t.co)

1/ Today we're releasing AttuneBench, the first open EQ benchmark grounded in real multi-turn human-model convers… (x.com)

DwarfStar prefill is a hell faster after merging (t.co)

shipped (x.com)

New paper on activation mixing. The authors evaluate several mixing strategies across both classical FFNs and Swi… (t.co)

Can current code agents survive beyond single-repo bug fixing? (t.co)

Gandalf code: (t.co)

The bottleneck in LLM inference isn't compute. It's how fast you can move the weights. (x.com)

Behind the build of self-improving tax agents with Codex (x.com)

Per (x.com)

And poof! just like that. all that obscurity to hide feature extraction/heuristic logic/verdict weights means fuc… (t.co)

Check out the new ESM models we’ve been building at (x.com)

Today, (x.com)

"It's easier to tune the LR for method A than for B." (t.co)

[1/5] Works on test set contamination focus on detection, but we show *correction* of inflated test scores is pos… (t.co)

LLMs represent concepts as vectors. Strikingly, taxonomies (organism → animal → bird) appear as hierarchies in em… (x.com)

I really appreciate the lessons and technical ideas (x.com)

[1/5] Works on test set contamination focus on detection, but we show correction of inflated test scores is pos… (t.co)