Backlist — 26 May 2026 UTC

1.

Using time-travel debugging and an AI agent on a 7B-instruction Android trace

An AI-assisted debugger traced a noisy ARM64 Android execution path through MTProto v2 decryption to AES-IGE in about ten minutes

by @eshard (eShard) · backlist 2026-05-26 · rubric 98.0

2.

npm is making package publishing part of the security boundary

2FA-gated publishing and install controls move registries from passive package stores toward active defenses against compromised maintainers and malicious releases

by @asadeddin (Ahmad Sadeddin) · backlist 2026-05-26 · rubric 88.0

3.

Decima predicts cell-type and disease-state gene expression from DNA (x.com)

A sequence-to-function model trained on more than 22M single cells predicts gene expression in specific cell types and disease states directly from DNA sequence

by @gokcen · backlist 2026-05-26 · rubric 86.0

4.

Choosing storage formats for robot learning data

Robot learning datasets stress storage systems differently from analytics workloads, forcing tradeoffs among Parquet, MCAP, Lance, NCore, and purpose-built formats

by @rerundotio (Rerun) · backlist 2026-05-26 · rubric 91.0

5.

Clarifying questions can make agents vulnerable to prompt injection

Across frontier models, asking clarifying questions turned apparently robust execution settings into prompt-injection attack success rates above 30% for several systems

by @ScaleAILabs (Scale Labs) · backlist 2026-05-26 · rubric 95.0

6.

Async mechanisms in modern GPU kernels

Modern GPU kernels now span multiple scheduling regimes, raising fresh questions about how kernel DSLs should expose asynchronous execution

by @ianbarber (Ian Barber) · backlist 2026-05-26 · rubric 94.0

7.

Amazon’s $350M Annapurna acquisition became a $20B chip business

Annapurna Labs now designs AWS chips including Graviton, Trainium, and Nitro, which Amazon says exceed $20B in run-rate revenue

by @tanayj (Tanay Jaipuria) · backlist 2026-05-26 · rubric 86.0

8.

PerturbSpace: spatial CRISPR screens on standard single-cell workflows (x.com)

PerturbSpace aims to combine spatially resolved multimodal readouts with whole-transcriptome CRISPR screens without leaving standard single-cell workflows

by @arcinstitute (Arc Institute) · backlist 2026-05-26 · rubric 84.0

9.

Interrupt handling on Windows on ARM (t.co)

A detailed Windows ARM64 interrupt-handling deep dive fills in low-level mechanics that are rarely documented for researchers and exploit developers

by @0xor0ne · backlist 2026-05-26 · rubric 86.0

10.

Hermeus reaches Mach 1.21 with a private unmanned jet

Quarterhorse Mk 2.1 became a privately developed unmanned supersonic jet, marking rapid progress from founding to Mach 1+ flight

by @hermeuscorp (Hermeus) · backlist 2026-05-26 · rubric 67.0

11.

Why parlays are hard for peer-to-peer prediction markets

In a peer-to-peer parlay market, a tiny bet with a huge payout can force a market maker to lock the entire payout as collateral until resolution

by @ImTheBigP (Pravesh) · backlist 2026-05-26 · rubric 84.0

12.

GitHub CLI tokens as a supply-chain attack surface

Long-lived GitHub CLI tokens stored on developer machines can be stolen by malicious scripts and used to escalate supply-chain incidents

by @jiahan_c (Jiahan Chen) · backlist 2026-05-26 · rubric 74.0

13.

Chao1 cardinality estimation

A small sample’s singleton and doubleton counts can estimate the total number of unique items in a large dataset with a simple formula

by @bribrisimps (bri) · backlist 2026-05-26 · rubric 0.0

14.

Meta’s experience with multi-datacenter training (t.co)

Meta’s report describes multi-datacenter training techniques including a pipeline-parallel schedule designed to work with ZeRO-2/3-style optimization

by @rosinality (Rosinality) · backlist 2026-05-26 · rubric 90.0

15.

VERVE-102: one-shot base editing to lower LDL cholesterol

Eli Lilly presented data on a single-infusion base-editing therapy targeting cholesterol biology, pointing toward durable cardiovascular-risk reduction

by @afshineemrani (Afshine Emrani MD FACC) · backlist 2026-05-26 · rubric 79.0

16.

WUJI Glove: a 60g tactile data glove for robot teleoperation

A lightweight tactile glove with 800Hz IMU data, 526 pressure points, and sub-2mm motion accuracy targets teleoperation and imitation-learning data collection

by @jerryhuang01 (Jerry Huang) · backlist 2026-05-26 · rubric 78.0

17.

Most chain-of-thought faithfulness detectors perform near chance

Eight proposed methods for detecting unfaithful chains of thought mostly failed when tested against ground-truth faithfulness labels

by @GurYoav (Yoav Gur Arieh) · backlist 2026-05-26 · rubric 90.0

18.

The first drop of ink effect in long-context LLMs

A small amount of distracting information in a long context can cause a discontinuous performance drop rather than a smooth degradation

by @muhan_gao (Muhan Gao) · backlist 2026-05-26 · rubric 86.0

19.

What California billionaires effectively pay in tax (t.co)

New research estimates the wealth and effective tax rates of California’s 200 billionaires, including founders behind Meta and Alphabet

by @gabriel_zucman (Gabriel Zucman) · backlist 2026-05-26 · rubric 78.0

20.

The AI sector may be growing 2,000% a year while GDP misses it (x.com)

Quality-adjusted AI output can expand extremely quickly while remaining nearly invisible in standard GDP statistics, creating a policy measurement gap

by @akorinek (Anton Korinek) · backlist 2026-05-26 · rubric 76.0

21.

The gap between dollars priced in FX markets and dollars that can move

Covered interest parity describes a clean pricing relationship, but real FX stress often comes from whether dollars can actually move through collateral and settlement plumbing

by @borjaneira_ (neira) · backlist 2026-05-26 · rubric 73.0

22.

Passing fetch to the server with capnweb

A tiny wrapper around capnweb lets a client pass a fetch function to a server so the server can fetch back into the client

by @jonas (Jonas Templestein) · backlist 2026-05-26 · rubric 84.0

23.

A Red Hat Enterprise Linux SELinux 0day

A researcher published a RHEL zero-day originally prepared for Pwn2Own Berlin, reopening the question of how much SELinux containment matters in 2026

by @rewhiles (rewhile) · backlist 2026-05-26 · rubric 64.0

24.

Formal Frontier: open-source AI autoformalization for mathematics (x.com)

The Mathlib Initiative is launching a project to make AI-driven autoformalization genuinely useful for researchers while keeping the work open source

by @tkalil2050 (Tom Kalil) · backlist 2026-05-26 · rubric 74.0

25.

TritonMoE: a fused MoE dispatch kernel in Triton (t.co)

TritonMoE implements the full MoE forward dispatch path with portable OpenAI Triton primitives instead of relying on custom vendor-specific kernels

by @Underfox3 (Underfox) · backlist 2026-05-26 · rubric 92.0

26.

A Waymo Jaguar carries about 1,100 pounds of sensors and compute

A registered Waymo I-PACE weighs roughly 1,100 pounds more than the stock vehicle, implying sensors and compute equivalent to several passengers

by @iAligator (Ali Haghani) · backlist 2026-05-26 · rubric 84.0

27.

Orbital data centers may become cost-competitive in 3–5 years

If Starship launch costs fall enough, orbital data centers could become cost-competitive with today’s terrestrial data centers within a few years, though not a major compute source before 2030

by @ahall_research (Andy Hall) · backlist 2026-05-26 · rubric 72.0

28.

Holocron: an open-source, self-hostable Mintlify alternative (t.co)

Holocron implements Mintlify-style docs as a Vite plugin that can be self-hosted on Vercel, Cloudflare, Docker, or other targets

by @__morse (Tommy D. Rossi) · backlist 2026-05-26 · rubric 84.0

29.

Parse 2.0: document parsing API for messy PDFs

Parse 2.0 targets forms, tables, handwriting, and scans by converting messy PDFs into markdown that downstream agents can act on

by @jordanalexmeyer (Jordan Meyer) · backlist 2026-05-26 · rubric 82.0

30.

We want to move the LP closer to the ILP. We find some cut (constraint) that is violated by the LP solution but w…

We want to move the LP closer to the ILP. We find some cut (constraint) that is violated by the LP solution but wouldn't be violated by any integer solution. We then solve the new LP and iterate. Our LP lower bound increases, and the rounde

by @unixpickle (Alex Nichol) · backlist 2026-05-26 · rubric 96.0

31.

@BowenWangNLP (x.com)

@BowenWangNLP et al. dropped 32,122 verifiable rlvr tasks for training cua agents which is about 87x of osworld tasks. large enough to experiment some cua rl scaling

by @guohao_li (Guohao Li ) · backlist 2026-05-26 · rubric 96.0

32.

If you can’t eval a thing easily it’s a product smell

If you can’t eval a thing easily it’s a product smell What you need from an analytics AI is proof that the number can be trusted. This means verifying quantities used against trusted reports, dashboards, prior analysis, etc to ensure n

by @HamelHusain (Hamel Husain) · backlist 2026-05-26 · rubric 96.0

33.

1/ We spent the last few days integrating Centaur (t.co)

1/ We spent the last few days integrating Centaur https:// github.com/paradigmxyz/ce ntaur … into Pareto Credit as an internal AI teammate. This is one of the first AI infra projects that actually made me think: “ok, this is how company a

by @bugduino (William | bugduino.eth) · backlist 2026-05-26 · rubric 96.0

34.

AlphaProof Nexus advancing research math, solving 9 Erdős problems & more!

AlphaProof Nexus advancing research math, solving 9 Erdős problems & more! Amazing experience to be part of this team & project. Excited for AI-driven formal proof search becoming a collaborator in math discovery, one that deepens human und

by @AnjaSurina (Anja Surina) · backlist 2026-05-26 · rubric 96.0

35.

LongLive 2.0 gets another speed boost!

LongLive 2.0 gets another speed boost! We further optimized the NVFP4 inference path, improving overall throughput by 18.6%. A 64s video now takes just 30.6s end-to-end, including VAE decoding. That’s over 2x real-time generation. Hi

by @AaronWeiHuang (Aaron Huang) · backlist 2026-05-26 · rubric 96.0

36.

Your CFO when you spend $300M on Claude because you have no routing logic

by @the_P_God (@the_P_God) · backlist 2026-05-26 · rubric 95.0

37.

Hα Sun time-lapse, the first of hopefully many. 1 hour of data acquisition (modest 30K frames, 130 GB raw), 8+ ho… (t.co)

Hα Sun time-lapse, the first of hopefully many. 1 hour of data acquisition (modest 30K frames, 130 GB raw), 8+ hours of processing (includes a few false starts). Full-size video, workflow details, etc: https:// app.astrobin.com/i/59uc7v

by @shipilev (Aleksey Shipilëv) · backlist 2026-05-26 · rubric 95.0

38.

The setup is this: we have some integer linear program that represents the optimal tokenizer problem. We relax it…

The setup is this: we have some integer linear program that represents the optimal tokenizer problem. We relax it to a continuous linear program so we can solve it fast. The solution will typically have fractional values, so we don't direct

by @unixpickle (Alex Nichol) · backlist 2026-05-26 · rubric 94.0

39.

From IcePop to KPop — our team keeps pushing on RL training stability for large MoE models.

From IcePop to KPop — our team keeps pushing on RL training stability for large MoE models. KPop replaces the fixed-ratio mask with an adaptive binary-KL region that matches each token's inherent noise. More robust updates, stable long-ho

by @AntLingAGI (Ant Ling) · backlist 2026-05-26 · rubric 94.0

40.

RLVR has become the recipe for agentic post-training. But for Computer-Use Agents, the bottleneck is not the algo…

Editor’s note: imported_from_x_likes

RLVR has become the recipe for agentic post-training. But for Computer-Use Agents, the bottleneck is not the algorithm, it is the data. We introduce CUA-Gym: a scalable, lightweight synthesis engine that turns arbitrary task queries into

by @BowenWangNLP (Bowen Wang) · backlist 2026-05-26 · rubric 94.0

41.

On-policy Distillation (OPD) can suffer from mode-seeking behavior due to the reverse KL objective. In our recent… (x.com)

On-policy Distillation (OPD) can suffer from mode-seeking behavior due to the reverse KL objective. In our recent work, we address this by augmenting OPD with a forward KL term. Please check out @wg_jin02 's post for more details!

by @kimin_le2 (Kimin) · backlist 2026-05-26 · rubric 94.0

42.

very awesome resource from hugging face with available slides about how they generated 1T synthetic data

very awesome resource from hugging face with available slides about how they generated 1T synthetic data a really cool sneak peek at what we feed foundation models

by @yacinelearning (Yacine Mahdid) · backlist 2026-05-26 · rubric 93.0

43.

I was nerdsniped over the weekend by this paper. I tried extending it by using various cutting plane strategies t…

I was nerdsniped over the weekend by this paper. I tried extending it by using various cutting plane strategies to train a provably optimal tokenizer. I made some progress, but it's still quite far from solved.

by @unixpickle (Alex Nichol) · backlist 2026-05-26 · rubric 92.0

44.

Started a new personal blog for shorter/informal posts to share ideas with folks!

Started a new personal blog for shorter/informal posts to share ideas with folks! My first post is about capacity in associative memory, why recall is not a sufficient statistic, and loose desiderata for end-to-end learnable AMs.

by @dhruv31415 (Dhruv π) · backlist 2026-05-26 · rubric 92.0

45.

new in-depth blog post time: Inside the Transformer: The Life of a Token

new in-depth blog post time: Inside the Transformer: The Life of a Token a deep dive into a modern dense transformer, i cover YaRN (why does pairwise coordinate rotation induce positional information?), hybrid attention (getting to 160k c

by @gordic_aleksa (Aleksa Gordić (水平问题)) · backlist 2026-05-26 · rubric 92.0

46.

I spent a year of my PhD stuck on a 2002 problem of Schechtman. GPT 5.5-Pro helped me finish: vector balancing fo…

I spent a year of my PhD stuck on a 2002 problem of Schechtman. GPT 5.5-Pro helped me finish: vector balancing for zonotopes (shadows of a cube)! For any zonotope Z ⊂ ℝᵈ, v₁,...,vₙ ∈ Z, there are signs x₁,...,xₙ ∈ {-1, 1} with x₁v₁+...+xₙv

by @vetohaze (Victor Reis) · backlist 2026-05-26 · rubric 92.0

47.

What if the very pretrained prior that lets an RL agent explore tools also destroys the format that made it tool-…

What if the very pretrained prior that lets an RL agent explore tools also destroys the format that made it tool-native? We name this the Tool Prior Paradox — and tame it with PARA-GRPO. Introducing ParaVT: parallel video tool use × agen

by @mwxely464 (Zuhao Yang) · backlist 2026-05-26 · rubric 92.0

48.

The era of "AI forgingAI" is officially here!

The era of "AI forgingAI" is officially here! Introducing ForgeTrain — the world’s first fully AI‑generated production‑level pre‑training framework. No human in the loop. This is not an experimental prototype, but a true "AI engine" with

by @OpenBMB · backlist 2026-05-26 · rubric 92.0

49.

GitHub - 7h30th3r0n3/CVE-2026-9082-Drupal-PoC: Drupal Core PostgreSQL SQL Injection PoC - CVE-2026-9082. Ethical …

GitHub - 7h30th3r0n3/CVE-2026-9082-Drupal-PoC: Drupal Core PostgreSQL SQL Injection PoC - CVE-2026-9082. Ethical PoC for the Drupal vulnerability allowing anonymous SQL injection through the JSON:API module on PostgreSQL-backed sites. · Git

by @akaclandestine (Clandestine) · backlist 2026-05-26 · rubric 92.0

50.

If acquiring a resource fails, then it's an error condition that should be returned to the caller. If releasing a…

If acquiring a resource fails, then it's an error condition that should be returned to the caller. If releasing a successfully acquired resource fails, then it's a bug that should cause an assertion to be triggered.

by @EricLengyel (Eric Lengyel) · backlist 2026-05-26 · rubric 92.0

51.

Introducing Preprint (t.co)

Introducing Preprint what if browser use could be just text? a research experiment which exposes web pages as text files to LLMs - Which they can edit to make actions, type, tap, etc. https:// github.com/supermemoryai/ preprint …

by @supermemory · backlist 2026-05-26 · rubric 92.0

52.

[ICML' 26] From Pixels to Tokens: A Systematic Study of Latent Action Supervision for Vision-Language-Action Models (t.co)

[ICML' 26] From Pixels to Tokens: A Systematic Study of Latent Action Supervision for Vision-Language-Action Models https:// github.com/RUCKBReasoning /From_Pixels_to_Tokens …

by @rsasaki0109 (Ryohei Sasaki@engineer) · backlist 2026-05-26 · rubric 92.0

53.

Slate powering (part of) an LLM KV cache!

by @criccomini (Chris) · backlist 2026-05-26 · rubric 92.0

54.

We built ASPI to isolate clarification-seeking as its own agent state.

We built ASPI to isolate clarification-seeking as its own agent state. Each benchmark scenario compares: - Execution mode → the agent receives a fully specified task - Clarification mode → the agent must ask follow-up questions before acti

by @ScaleAILabs (Scale Labs) · backlist 2026-05-26 · rubric 91.0

55.

Are we nearing a compute crunch? (x.com)

Are we nearing a compute crunch? In our latest Gradient Update, @luke__emberson and @Jsevillamol estimate how many tokens all the Blackwell chips on Earth could serve, and compare this to total token demand. Direct comparisons are diff

by @EpochAIResearch (Epoch AI) · backlist 2026-05-26 · rubric 91.0

56.

What is the role of text tokens in diffusion? Do they carry anything beyond the text prompt? We study this in FLUX.2 (x.com)

What is the role of text tokens in diffusion? Do they carry anything beyond the text prompt? We study this in FLUX.2 @bfl_ml for the task of reference-guided generation, and found that text tokens hold visual information from the referenc

by @TamarRottShaham (Tamar Rott Shaham) · backlist 2026-05-26 · rubric 91.0

57.

why does agent infra (DBs, Sandboxes, Workflow Engines) need <100ms latency when one call to a thinking model tak…

why does agent infra (DBs, Sandboxes, Workflow Engines) need <100ms latency when one call to a thinking model takes dozens of seconds?

by @almoggavra (Almog Gavra) · backlist 2026-05-26 · rubric 91.0

58.

Not to degrade from this work, but TurboQuant is not a competitive method nor a good benchmark. Researcher -- inc…

Not to degrade from this work, but TurboQuant is not a competitive method nor a good benchmark. Researcher -- including me -- cannot replicate the TurboQuant paper, and even then, the performance is not great. Please. Just. Stop.

by @Tim_Dettmers (Tim Dettmers) · backlist 2026-05-26 · rubric 91.0

59.

This was a very fun project. In Behavior-Consistent Deep RL, we provide a method that aligns the behavior of inde…

This was a very fun project. In Behavior-Consistent Deep RL, we provide a method that aligns the behavior of independently trained policies. It turns out, this works even in high dimensional spaces. Here are 6 seeds of Humanoids (all ca sam

by @marcel_hussing (Marcel Hussing) · backlist 2026-05-26 · rubric 91.0

60.

RL environment startups might be cooked with this one

Editor’s note: imported_from_x_likes

RL environment startups might be cooked with this one Incredible work by Bowen and XLANG Lab! Scaling data through end to end synthetic pipelines: Tasks, environment, and verifier all created autonomously through coding agent. Also supe

by @ewveggies (Kyle Wong) · backlist 2026-05-26 · rubric 91.0

61.

Today, we’re sharing a new state of the art for computer use.

Today, we’re sharing a new state of the art for computer use. Our system holds the two highest verified scores on OSWorld, the standard benchmark for AI agents that operate a computer like a person: 83.6% using Claude Opus 4.7 and 81.5% u

by @nealchopra (Neal Chopra) · backlist 2026-05-26 · rubric 91.0

62.

We just shipped a crazy update to Sentinel- we doubled the quality of video without affecting latency. (x.com)

We just shipped a crazy update to Sentinel- we doubled the quality of video without affecting latency. This is teleoperation from ~2k miles away. Scaling teleop is now possible @AveaRobotics

by @aryind_ (Ary Indarapu) · backlist 2026-05-26 · rubric 91.0

63.

Introducing EAGLE 3.1 — a major step forward in speculative decoding robustness, efficiency, and deployability. F… (x.com)

Introducing EAGLE 3.1 — a major step forward in speculative decoding robustness, efficiency, and deployability. From the EAGLE team @hongyangzh , in collaboration with vLLM @vllm_project and TorchSpec teams. > FC norm + post-norm archit

by @lightseekorg (LightSeek Foundation) · backlist 2026-05-26 · rubric 91.0

64.

A little over 2 years ago, I solved the SolidGoldMagikarp stability problem.

A little over 2 years ago, I solved the SolidGoldMagikarp stability problem. Today, I am releasing the results of that work as a new technique to regularize training. More details below.

by @hi_tysam (Fern) · backlist 2026-05-26 · rubric 91.0

65.

Your logging pipeline is a security control.

Your logging pipeline is a security control. If it can be tampered with, turned off, or overwhelmed by an attacker, your detection capability has a kill switch.

by @arnavsharma (Arnav Sharma ) · backlist 2026-05-26 · rubric 91.0

66.

Your Embedding Model is SMARTer Than You Think! Single-vector models actually hide powerful multi-vector capabili… (t.co)

Your Embedding Model is SMARTer Than You Think! Single-vector models actually hide powerful multi-vector capabilities in their frozen hidden states. We introduce SMART, a framework that unlocks this ability for SoTA multimodal retrieval.

by @HyperStorm9682 (Harris Zhang) · backlist 2026-05-26 · rubric 91.0

67.

Vibe Coding A Human Designer App With ThreeJS: Exports and Trade-Offs

Vibe Coding A Human Designer App With ThreeJS: Exports and Trade-Offs We can now fully export a designed human with custom skin, deformations, hairs and clothing as glb!! However, glb knows nothing about my super nice hair shader so i wor

by @alightinastorm (robot) · backlist 2026-05-26 · rubric 91.0

68.

Today we’re releasing 1-bit and Ternary Bonsai Image 4B.

Today we’re releasing 1-bit and Ternary Bonsai Image 4B. A new family of image-generation models designed to run high-quality diffusion inference on local hardware: from laptops to phones.

by @eraznafre (Erfanzar) · backlist 2026-05-26 · rubric 90.0

69.

Agentic tasks are the biggest story. There was a meaningful increase on Vibe Code Bench, +22pp from its predecess…

Agentic tasks are the biggest story. There was a meaningful increase on Vibe Code Bench, +22pp from its predecessor Qwen 3.6 Plus. We also saw increases on Finance Agent v2 +8pp and Terminal Bench 2 +14pp. These are large gains across the b

by @ValsAI (Vals AI) · backlist 2026-05-26 · rubric 90.0

70.

Building a Speculative Decoding Inference

Building a Speculative Decoding Inference speculative decoding (sds) is when a small "draft" model predicts multiple tokens fast, then a big "target" model verifies them all at once. if done right, you get ~2x faster generation without any

by @mohitwt_ (mohit) · backlist 2026-05-26 · rubric 90.0

71.

Someone on social media was bragging they got a CSAM website taken offline. They illustrated this by showing a Cl…

Someone on social media was bragging they got a CSAM website taken offline. They illustrated this by showing a CloudFlare report. The report shows the domain this person reported. CloudFlare clearly states it is being investigated, forward

by @vxunderground (vx-underground) · backlist 2026-05-26 · rubric 90.0

72.

Introducing MathCode 0.2.0: maximize prompt-cache hit rates and reduce API costs by up to 90%. (t.co)

Introducing MathCode 0.2.0: maximize prompt-cache hit rates and reduce API costs by up to 90%. Project Page: https:// github.com/math-ai-org/ma thcode …

by @yifan_zhang_ (Yifan Zhang) · backlist 2026-05-26 · rubric 90.0

73.

This is a killer stack

This is a killer stack I just started using Wafer to serve my qwen3.6-27b custom fine tuned llm and it's excellent

by @garrytan (Garry Tan) · backlist 2026-05-26 · rubric 90.0

74.

new minimax sparse attention compared to deepseek v3.2 (DSA) and v4 (CSA)

new minimax sparse attention compared to deepseek v3.2 (DSA) and v4 (CSA) main changes: - based on GQA not MLA - block level selection like in CSA but attention is done on the real KV, not in the compressed dimension

by @eliebakouch (elie) · backlist 2026-05-26 · rubric 89.0

75.

We evaluated CoT faithfulness evaluations & released 𝐁𝐨𝐧𝐚𝐅𝐢𝐝𝐞 so you can test yours too!!

by @anmarasovic (Ana Marasović) · backlist 2026-05-26 · rubric 89.0

76.

Autoregressive transformers have a core problem that limits their decoding performance: teacher forcing. This tec…

Autoregressive transformers have a core problem that limits their decoding performance: teacher forcing. This technique has been around for a while, and has let us train them massively in parallel. But it has a significant inference gap tha

by @hi_tysam (Fern) · backlist 2026-05-26 · rubric 89.0

77.

Our supply estimate is based on serving Kimi K2.6, a trillion-parameter model with 32B active parameters. Using 8…

Our supply estimate is based on serving Kimi K2.6, a trillion-parameter model with 32B active parameters. Using 8k:1k input-to-output token requests, we estimate it would be possible to serve ~20B output tok/s, enough to serve every person

by @EpochAIResearch (Epoch AI) · backlist 2026-05-26 · rubric 88.0

78.

We're releasing early results from training Kos-1 Experimental, a Kimi K2.5 checkpoint post-trained on the same m…

We're releasing early results from training Kos-1 Experimental, a Kimi K2.5 checkpoint post-trained on the same medical RL data we used for Kos-1 Lite. As clinical workloads become more agentic, we wanted a model that pairs medical domain

by @bertgodel (Daanish Khazi) · backlist 2026-05-26 · rubric 88.0

79.

KnowledgeDeliver flaw exploited as a zero-day to install web shells (t.co)

KnowledgeDeliver flaw exploited as a zero-day to install web shells https:// bleepingcomputer.com/news/security/ knowledgedeliver-flaw-exploited-as-a-zero-day-to-install-web-shells/ …

by @BleepinComputer (BleepingComputer) · backlist 2026-05-26 · rubric 88.0

80.

People are building interactive agents (in addition to background agents). The time-to-interactive (TTI) metric m…

People are building interactive agents (in addition to background agents). The time-to-interactive (TTI) metric measures how quickly users see something from the agent. Users often just see a spinner while a sandbox spins up and installs t

by @diptanu (Diptanu Choudhury) · backlist 2026-05-26 · rubric 88.0

81.

Building on (x.com)

Building on @nilinabra 's Soft Muon idea, I found a set of polynomials you can use to compute UΣᵖVᵀ accurately for |p| < 0.9 as efficiently as Newton-Schulz/Polar Express. check it out!

by @varunneal (varun) · backlist 2026-05-26 · rubric 88.0

82.

i got excited when i saw (x.com)

i got excited when i saw @Nick_Prince12 post so i asked my agent something similar.... a US economy snapshot report based on @michaeljburry substack vs status of live US market stats & where things stand now. i let my agent use followi

by @gegatsur (Gega Tsurtsumia) · backlist 2026-05-26 · rubric 88.0

83.

Our on-device TTS model Phonon (100M params) now reaches 1.00% WER on the Seed-TTS English benchmark.

Our on-device TTS model Phonon (100M params) now reaches 1.00% WER on the Seed-TTS English benchmark. Smaller than every model it already beats.

by @GradiumAI (Gradium) · backlist 2026-05-26 · rubric 88.0

84.

The latent-vs-pixel debate misses the point.

The latent-vs-pixel debate misses the point. GPT Image 2 shows what users notice: pixel-level fidelity. Latent models show what scales: compact semantic structure. We connect them by replacing VAE/RAE decoders with a Pixel Diffusion Decod

by @xuanchi13 (Xuanchi Ren) · backlist 2026-05-26 · rubric 88.0

85.

Curious about the secret sauce behind our trillion-scale agentic foundation model? Here it comes!

Curious about the secret sauce behind our trillion-scale agentic foundation model? Here it comes! Last year, we released IcePop to stabilize MoE RL with double-sided masking. As we dive deeper, something unexpected happened: the masking r

by @Jia__Guo (Jia Guo) · backlist 2026-05-26 · rubric 88.0

86.

Over 1 billion PDFs are created every day, but your agents still can’t read them reliably.

Over 1 billion PDFs are created every day, but your agents still can’t read them reliably. Today we’re releasing Parse 2.0, the most accurate document parsing API in the world. Extend already processes millions of pages daily for leading

by @kushalbyatnal (Kushal Byatnal) · backlist 2026-05-26 · rubric 88.0

87.

You can now run GPT, Claude & other models in Unsloth. (t.co)

You can now run GPT, Claude & other models in Unsloth. Connect + run APIs in a local UI: - Code execution, web search, image gen, editing - Auto prompt caching to save costs - Provider features like cites, sandboxes GitHub: https:// gith

by @UnslothAI (Unsloth AI) · backlist 2026-05-26 · rubric 88.0

88.

"Don't trust. Evaluate." (x.com)

"Don't trust. Evaluate." @nearestnabors set out to replace Claude Sonnet with Gemma 4. The evals showed a quantifiably better option. Full walkthrough: capability evals + prompt engineering to ship a local 3B that matches Sonnet, 2x fas

by @ArizePhoenix (arize-phoenix) · backlist 2026-05-26 · rubric 88.0

89.

With CodeAgent, I can finally pick up so many things I’d dropped due to low energy. Blogging is one of them. Th… (t.co)

With CodeAgent, I can finally pick up so many things I’d dropped due to low energy. Blogging is one of them. This blog is ~1% me, 99% the agent https:// victorchen96.github.io/auto_research_ survey.pdf … (Disclaimer: Just doing this f

by @victor207755822 (Deli Chen) · backlist 2026-05-26 · rubric 88.0

90.

splat-transform's offline rasteriser now supports depth of field.

splat-transform's offline rasteriser now supports depth of field. Each Gaussian dilates by its own circle of confusion in the projection pass. New flags: --f-stop, --focus-distance, --sensor-size. This test was rendered with a simulated

by @slimbuck7 (Donovan Hutchence) · backlist 2026-05-26 · rubric 88.0