Backlist — 18 May 2026 UTC

1.

Runway’s notes on using DTensor for distributed training correctness

Runway adopted DTensor to prevent silent gradient bugs, then documented the dispatch overhead, recompilation storms, and MFU losses that came with the safety gain

by @kamilsindi (Kamil Sindi) · backlist 2026-05-18 · rubric 96.0

2.

AI-assisted formal verification of real C++ in the Monad execution client

Frontier models missed bugs in ordinary review but found them when asked to construct Rocq proofs against real C++ code

by @AbAnand9 (Understand then QUESTION science) · backlist 2026-05-18 · rubric 91.0

3.

PyTorch is adding native MoE token dispatch and combine

Torch distributed TokenSwitch backed by NCCL-EP brings expert-parallel MoE dispatch and combine closer to a standard PyTorch primitive

by @suryaasub (surya) · backlist 2026-05-18 · rubric 94.0

4.

Pwn2Own Berlin 2026 exploited Codex, Cursor, and LM Studio (x.com)

Participants earned about $1.3M for 47 vulnerabilities, including successful exploits against AI developer products

by @Techmeme · backlist 2026-05-18 · rubric 92.0

5.

Video-call RTP headers can leak speech and movement patterns

Encrypted browser calls can still expose who is on a call and their motion or speech patterns through unencrypted RTP metadata

by @davidgu (David Gu) · backlist 2026-05-18 · rubric 0.0

6.

A decade-old Prius steering API enabled openpilot

A steering API built for a minor Toyota nudge feature later let more than 1,000 old Priuses run open-source driver-assistance software

by @___Harald___ (Harald Schäfer) · backlist 2026-05-18 · rubric 93.0

7.

GitButler’s Windows build broke after DigiCert revoked its signing cert (x.com)

A code-signing revocation crippled GitButler’s Windows distribution after the critical notice was buried among dozens of webinar-like emails

by @chacon (Scott Chacon) · backlist 2026-05-18 · rubric 84.0

8.

BLASST wins MLSys best paper for training-free sparse attention

BLASST uses online softmax statistics and a scalar threshold to skip negligible attention blocks without training a new model

by @jiayiy (Jiayi Yuan) · backlist 2026-05-18 · rubric 89.0

9.

llama.cpp adds MTP support for Qwen3.6

MTP support gives Qwen3.6-family local inference a large performance jump on commodity hardware

by @ggerganov (Georgi Gerganov) · backlist 2026-05-18 · rubric 93.0

10.

Slicing and Dicing MoEs: lessons from training 2,000+ MoE language models

The study maps the MoE design space across expert count, expert size, shared experts, routing, and token dropping, concluding that expert size and count dominate

by @margs_li (Margaret Li) · backlist 2026-05-18 · rubric 78.0

11.

Classifier context rot in long agent transcripts

Monitor performance can degrade sharply when malicious actions are embedded inside or before long benign transcripts

by @FabienDRoger (Fabien Roger) · backlist 2026-05-18 · rubric 90.0

12.

Coding agents are limited by the quality of feedback loops

Tasks with accurate feedback become easy for coding agents, while tasks without reliable feedback remain hard regardless of raw model capability

by @MarcJBrooker (Marc Brooker) · backlist 2026-05-18 · rubric 88.0

13.

Browse.sh: an open-source catalog of web skills for agents (t.co)

Browserbase released a catalog of researched website playbooks intended to make browser agents more reliable on real web tasks

by @browserbase (Browserbase) · backlist 2026-05-18 · rubric 94.0

14.

Boston Dynamics trains Atlas to lift a fridge in simulation

Atlas learned a heavy-object manipulation behavior by practicing many fridge variations in simulation before transferring the policy to the robot

by @AndrewCurran_ (Andrew Curran) · backlist 2026-05-18 · rubric 84.0

15.

Tencent open-sources HY World 2.0 (t.co)

HY World 2.0 ships full inference code and models for building interactive generated worlds

by @DylanTFWang (Tengfei Wang) · backlist 2026-05-18 · rubric 87.0

16.

A once-nightly pill for severe obstructive sleep apnea passes Phase 3 (t.co)

A randomized Phase 3 trial found a nightly oral drug effective for severe obstructive sleep apnea

by @EricTopol (Eric Topol) · backlist 2026-05-18 · rubric 41.0

17.

The slippery protein problem: drugging KRAS in pancreatic cancer (x.com)

KRAS mutations drive about 90% of pancreatic cancers, and recent progress has made a formerly undruggable cancer target tractable

by @RuxandraTeslo (Ruxandra Teslo ) · backlist 2026-05-18 · rubric 32.0

18.

The China question is tearing biotech apart (x.com)

Biotech is split between the speed and scale of China-based manufacturing and the long-term risks of outsourcing strategic medical capacity

by @JasonUkman (Jason Ukman) · backlist 2026-05-18 · rubric 38.0

19.

Morgan Stanley compares 1 GW Nvidia GPU data centers with custom ASICs

A 1 GW Blackwell data center may cost up to twice as much as current TPU or Trainium builds, but Nvidia’s compute power efficiency changes the comparison

by @firstadopter (tae kim) · backlist 2026-05-18 · rubric 86.0

20.

Anker’s NOR-flash compute-in-memory chip flew under the radar

Anker announced a compute-in-memory chip using mature NOR flash, potentially avoiding the most contested AI memory supply chains

by @stevehou (Steve Hou) · backlist 2026-05-18 · rubric 72.0

21.

SEC reportedly leans toward allowing third-party tokenized stocks without issuer consent

Tokenized equities could become tradable through third parties even when the underlying public company never opted in

by @credistick (Dan Gray) · backlist 2026-05-18 · rubric 38.0

22.

NextEra and Dominion strike a $400B utility megadeal

A major U.S. utility combination would reshape the power sector just as data-center and electrification demand become central constraints

by @ftenergy (FT Energy) · backlist 2026-05-18 · rubric 21.0

23.

arXiv will ban authors for low-quality AI-generated papers (t.co)

arXiv updated its code of conduct so authors are accountable for unverified AI-generated content and can face a one-year ban for low-quality submissions

by @AlternativeTo · backlist 2026-05-18 · rubric 58.0

24.

Children’s language systems are left-lateralized earlier than expected (x.com)

Young children already show a strongly left-lateralized language system, complicating explanations of recovery after early left-hemisphere damage

by @ev_fedorenko (Ev (like in 'evidence', not Eve) Fedorenko ) · backlist 2026-05-18 · rubric 86.0

25.

District-level Indian agricultural census data, cleaned and released

The dataset covers Indian operational landholdings by district, social group, and farm size across states and union territories

by @pranav_so (pranav (in hyderabad)) · backlist 2026-05-18 · rubric 88.0

26.

Bun’s Rust rewrite broke parser stack-overflow tests by going deeper

The Rust port let TOML and YAML parsers recurse beyond the old test’s expected stack-overflow point, exposing a subtle benchmark/test assumption

by @jarredsumner (Jarred Sumner) · backlist 2026-05-18 · rubric 92.0

27.

Grafana says hackers stole its code and refuses to pay ransom

Grafana disclosed a code theft incident while publicly refusing to pay the attackers’ ransom demand

by @TechCrunch · backlist 2026-05-18 · rubric 76.0

28.

Families leaving NYC: births aren’t the issue, migration is

NYC’s under-18 population is falling despite positive natural change because the child population shift is driven by families moving to the suburbs

by @AzizSunderji (Aziz Sunderji) · backlist 2026-05-18 · rubric 71.0

29.

An eight-year-old Android phone as a Hermes agent server

Replacing Android with postmarketOS turned an old ARM64 phone into a Matrix-connected, end-to-end encrypted Hermes agent server

by @wongmjane (Jane Manchun Wong) · backlist 2026-05-18 · rubric 89.0

30.

Reward hacking is an arms race between coding agents and RL envs.

Reward hacking is an arms race between coding agents and RL envs. A common eval flaw: the agent and verifier share the same sandbox. If the agent can tamper with the grader, “pass” may just mean “cheated.”

by @rishi_desai2 (Rishi Desai) · backlist 2026-05-18 · rubric 96.0

31.

https:// (t.co)

https:// arxiv.org/abs/2605.15422 Kernel-level implementation of prefix grouping for group-based RL.

by @rosinality (Rosinality) · backlist 2026-05-18 · rubric 96.0

32.

We recently built an AI assistant inside (x.com)

We recently built an AI assistant inside @Razorpay called Slash. It reads our entire codebase, debugs production incidents, reviews specs, writes code, reviews every single PR, answer tech queries and also raises PRs for small features.

by @shashank_kr (Shashank Kumar) · backlist 2026-05-18 · rubric 96.0

33.

GPUs for sale from my friend:

GPUs for sale from my friend: What's available: - 20 nodes of H200 NVL - Located in India, ready to deploy ASAP - Ideal for inference workloads Pricing (per GPU/hr): - $2.8 — 6-month minimum commit - $2.6 — 12-month commit Why this matter

by @w2sgarnav (arnav sonavane) · backlist 2026-05-18 · rubric 95.0

34.

Sub-second image generation with Flux.2 [dev] and Qwen-Image:

Sub-second image generation with Flux.2 [dev] and Qwen-Image: Flux.2 [dev]: 2.3x faster, 0.98s latency (B200) Qwen-Image: 1.6x faster, 0.87s latency (B200) Details on how we got there in Faraz's article.

by @baseten (Baseten) · backlist 2026-05-18 · rubric 94.0

35.

First line of defense: a clean verifier.

First line of defense: a clean verifier. The agent should get a normal dev environment: files, shell, build tools. But when the run ends, the harness destroys that environment and copies only the declared artifact into a fresh verifier.

by @rishi_desai2 (Rishi Desai) · backlist 2026-05-18 · rubric 94.0

36.

More on v3.6.1. The new XL neural depth models:

More on v3.6.1. The new XL neural depth models: 1248x780 @ 8.5 FPS 1056x660 @ 11 FPS 864x540 @ 17 FPS 768x480 @ 22 FPS Higher resolution means finer detail with thin structures, object edges, small geometry... All while maintaining a

by @luxonis (Luxonis | Robotic Vision) · backlist 2026-05-18 · rubric 93.0

37.

If AI is code, and AI can code, let’s automate AI research and then discover new knowledge everywhere else! New b…

If AI is code, and AI can code, let’s automate AI research and then discover new knowledge everywhere else! New blog announcing our investment in RSI, and why this team is best suited to making open-ended learning a reality.

by @rohan_virani (Rohan Virani) · backlist 2026-05-18 · rubric 92.0

38.

I'm excited to share our TeamBench , a new benchmark for evaluating agent coordination under operating system-enf…

I'm excited to share our TeamBench , a new benchmark for evaluating agent coordination under operating system-enforced role separation. Multi-agent systems have become a dominant paradigm for building AI agents. However, most evaluations a

by @ybkim95_ai (Yubin Kim) · backlist 2026-05-18 · rubric 92.0

39.

Codex CLI 0.131.0 is out.

Codex CLI 0.131.0 is out. Highlights: - Python SDK moved to openai-codex / openai_codex, with pinned runtime-generated types, concurrent turn routing, and approval modes - codex doctor added for support-ready diagnostics across runtime, au

by @CodexReleases (Codex Releases) · backlist 2026-05-18 · rubric 92.0

40.

New post: "Generalization Dynamics of LM Pre-training"

New post: "Generalization Dynamics of LM Pre-training" Most people (including me) assume that LMs smoothly mature from pattern-matching to generalizing. This mental model is wrong. The true dynamics are stranger, and far more fascinating

by @jiaxinwen22 (Jiaxin Wen) · backlist 2026-05-18 · rubric 92.0

41.

Qwen3.6 now runs 2x faster with MTP GGUFs! Run locally on just 18GB RAM. (t.co)

Qwen3.6 now runs 2x faster with MTP GGUFs! Run locally on just 18GB RAM. MTP enables Qwen3.6 to generate ~1.4–2.2× faster with no accuracy change. Qwen3.6-27B MTP runs at 160 tokens/s. 35B-A3B reaches 240 t/s. GGUFs: https:// huggingfa

by @UnslothAI (Unsloth AI) · backlist 2026-05-18 · rubric 92.0

42.

Video lectures, UC Berkeley CS 182 / 282a Deep Learning fall 2025, by Gireeja Ranade & Anant Sahai (t.co)

Video lectures, UC Berkeley CS 182 / 282a Deep Learning fall 2025, by Gireeja Ranade & Anant Sahai https:// berkeley-cs182.github.io/fa25/ https:// youtube.com/playlist?list= PLIygTcviGPKCJO2wgN4rjqRFozoPjvWQs … .

by @caglar_ee (Caglar) · backlist 2026-05-18 · rubric 92.0

43.

1 Trillion Dense Model (x.com)

1 Trillion Dense Model Ring-2.6-1T from @TheInclusionAI just dropped A 1 trillion-parameter open reasoning model built for agent workflows, not just Q&A. 63.82 ClawEval (top-tier among open models) Adjustable reasoning effort: high

by @nathanhabib1011 (Nathan) · backlist 2026-05-18 · rubric 92.0

44.

Second fix: control network access.

Second fix: control network access. Unrestricted egress lets agents fetch solutions or use external tools to bypass task difficulty. Off-container agents can keep model/API traffic outside the task sandbox. For on-container agents like C

by @rishi_desai2 (Rishi Desai) · backlist 2026-05-18 · rubric 91.0

45.

update: tested GATS on GPT-5.5

update: tested GATS on GPT-5.5 BFCL: +5.34% τ²-bench: +2.32% so there is consistent gain of three GPT models: GPT-4o, GPT-5, and GP5.5. simulated feedback for tool-calls refinement keeps working even as base models get stronger. code

by @LiangZheng_06 (Liang Zheng) · backlist 2026-05-18 · rubric 91.0

46.

A big factor is that evals are harder to trust in safety work. If an AI can solve IMO problems, it's probably goo…

Editor’s note: imported_from_x_likes

A big factor is that evals are harder to trust in safety work. If an AI can solve IMO problems, it's probably good at math. If an AI gets a perfect safety score, it could be very safe or it could be very eval aware. There's also a long hi

by @a_karvonen (Adam Karvonen) · backlist 2026-05-18 · rubric 91.0

47.

they dont know that modal has negative latency. it actually saves time

by @andersonbcdefg (Ben (no treats)) · backlist 2026-05-18 · rubric 90.0

48.

Self-distillation for long-horizon training at scale!

by @jonashubotter (Jonas Hübotter) · backlist 2026-05-18 · rubric 90.0

49.

Added a smol new section to last week's blog post on the technical internals of (x.com)

Added a smol new section to last week's blog post on the technical internals of @modal 's fast cold boots. This section describes how we frame cloud buffer management as a linear optimization problem and solve it with GLOP. https:// mod

by @charles_irl (Charles Frye) · backlist 2026-05-18 · rubric 90.0

50.

What are best practices for running Claude Code at scale?

What are best practices for running Claude Code at scale? New blog post on what we've learned from teams running it across multi-million-line monorepos, decades-old legacy systems, and distributed microservices:

by @ClaudeDevs · backlist 2026-05-18 · rubric 90.0

51.

trained an actual reward hacker with RL to study as a model organism for qwen 3 14b, plan to train some more (x.com)

trained an actual reward hacker with RL to study as a model organism for qwen 3 14b, plan to train some more ty @PrimeIntellect for good infra and @_VGen_ for env :D Checkpoints included for every step: https:// huggingface.co/ceseld

by @celestepoasts (Celeste) · backlist 2026-05-18 · rubric 90.0

52.

Efficient AI Lecture 14: LLM Post-Training

Efficient AI Lecture 14: LLM Post-Training PEFT is one of the most practical ideas in LLM post-training. Instead of updating the whole model, train a tiny targeted part: - Adapters: small inserted modules - Prompt tuning: soft prom

by @ickma2311 (Chao Ma) · backlist 2026-05-18 · rubric 90.0

53.

Had a chance to fully read the MolmoACT2 paper today.

Had a chance to fully read the MolmoACT2 paper today. Imo, the ablation results are the most exciting part. So many ideas popping.

by @pham_blnh (Binh Pham) · backlist 2026-05-18 · rubric 89.0

54.

With 99.98% uptime, Codex only sleeps 8 minutes per month.

by @thsottiaux (Tibo) · backlist 2026-05-18 · rubric 89.0

55.

2.5% of our sandboxes run longer than 24 hrs. That 2.5% brings 20% of our revenue.

2.5% of our sandboxes run longer than 24 hrs. That 2.5% brings 20% of our revenue. Long-running stateful workloads are not an edge case. It feels weird to see that this isn't the consensus yet.

by @ivanburazin (Ivan Burazin) · backlist 2026-05-18 · rubric 88.0

56.

Announcing the Rogo Excel Plug-In.

Announcing the Rogo Excel Plug-In. Felix, our AI agent for finance, now native to Microsoft Excel. Build, extend, and audit models grounded in your firm's conventions and precedents, without leaving your workbook.

by @RogoAI (Rogo) · backlist 2026-05-18 · rubric 88.0

57.

5.5 Is a great model, but man is it bad at writing good code on its own

by @realmcore_ (akira) · backlist 2026-05-18 · rubric 88.0

58.

AI agents in healthcare face tight constraints: latency can't exceed 800ms per turn, the first turn processes 10k… (x.com)

AI agents in healthcare face tight constraints: latency can't exceed 800ms per turn, the first turn processes 10k tokens of context, and safety models analyze the conversation in parallel. Using our MAX framework, @hippocraticai keeps pa

by @Modular · backlist 2026-05-18 · rubric 88.0

59.

Targeted RL with textual feedback sounds interesting, basically self-distill from a model with hint to one withou…

Targeted RL with textual feedback sounds interesting, basically self-distill from a model with hint to one without hint, creating dense reward signal alongside the super long rollout.

by @wenhaocha1 (Wenhao Chai) · backlist 2026-05-18 · rubric 88.0

60.

Introducing Agora-1, a world model that's learned to simulate multi-agent experiences. It's so fun.

Introducing Agora-1, a world model that's learned to simulate multi-agent experiences. It's so fun. Today we're launching a playable research preview, where you can relive your childhood and enjoy a multiplayer simulation of GoldenEye. So

by @olivercameron (Oliver Cameron) · backlist 2026-05-18 · rubric 88.0

61.

browse skills add (t.co)

browse skills add http:// poke.com/send-message

by @samyok · backlist 2026-05-18 · rubric 88.0

62.

Demo gods were on my side for this guest lecture on AI Agent Security at (x.com)

Demo gods were on my side for this guest lecture on AI Agent Security at @MIT_CSAIL : I was able to show a prompt injection attack against @AnthropicAI 's Opus 4.6 model. Agent security is still an unsolved problem!

by @anishathalye (Anish Athalye) · backlist 2026-05-18 · rubric 88.0

63.

New Anthropic Fellows research: Classifier Context Rot

New Anthropic Fellows research: Classifier Context Rot Anthropic monitors for dangerous actions in agent transcripts that are getting very long. Can monitors handle such long transcripts?

by @SamMartin589196 (Sam Martin) · backlist 2026-05-18 · rubric 88.0

64.

Arabic. Japanese. Turkish. Redacting clinical discharge summaries in real-time. (x.com)

Arabic. Japanese. Turkish. Redacting clinical discharge summaries in real-time. 30+ new open-source PII models shipped today on @huggingface . 30+ MLX variants as native Swift packages for macOS and iOS. OpenMed PII family: 1M+ downloads

by @MaziyarPanahi (Maziyar PANAHI) · backlist 2026-05-18 · rubric 88.0

65.

NLAs are claimed to verbalize model activations. But can they faithfully interpret steered activations?

NLAs are claimed to verbalize model activations. But can they faithfully interpret steered activations? In our latest paper, we show that steering moves activations into non-invertible regions; and almost surely, no prompt maps to steered

by @aamixsh (Aayush Mishra) · backlist 2026-05-18 · rubric 88.0

66.

In a new article, we take a tour of epoll and io_uring through the lens of an HTTP file server, starting off firs…

In a new article, we take a tour of epoll and io_uring through the lens of an HTTP file server, starting off first with a synchronous thread-per-request server as a baseline.

by @theconsensusdev (The Consensus) · backlist 2026-05-18 · rubric 88.0

67.

GitHits exists because AI agents can read your repo, but not the open-source code your repo depends on.

by @skvark (Olli-Pekka Heinisuo) · backlist 2026-05-18 · rubric 88.0

68.

A lightweight attention method to speed up pretraining, especially for long-context models.

A lightweight attention method to speed up pretraining, especially for long-context models. It doesn’t try to reinvent something new. Instead, it wraps a non-learnable pipeline around FlashAttention. it downsamples the sequence using a no

by @harshbhatt7585 (Harsh Bhatt) · backlist 2026-05-18 · rubric 88.0

69.

ClickFix just leveled up.

ClickFix just leveled up. One user-pasted command now drops scheduled task persistence + PySoxy (a 10-year-old open-source Python SOCKS5 proxy) for encrypted backup access. Blocking the first C2? Doesn’t stop it — the task keeps retrying

by @TheHackersNews (The Hacker News) · backlist 2026-05-18 · rubric 88.0

70.

Two new papers in (x.com)

Two new papers in @AI_PrecisionOnc this month with @NGThaker_XRT and collaborators: RAG in Oncology — where it works, where it breaks, what it takes to go from demo to deployment Data Transparency as AI-Ready Infrastructure — AI can

by @DrArturoAI (Arturo LoAIza-Bonilla, MD MSEd) · backlist 2026-05-18 · rubric 88.0

71.

Code released for “Predictive but Not Plannable: RC-aux for Latent World Models”. (t.co)

Code released for “Predictive but Not Plannable: RC-aux for Latent World Models”. RC-aux adds lightweight reachability correction to latent world models，improving planning without changing the LeWM backbone. http:// github.com/Guang000

by @Guang_LI0 (Guang Li) · backlist 2026-05-18 · rubric 88.0

72.

Well expert iteration is an (inefficient) policy gradient algorithm.

by @rm_rafailov (Rafael Rafailov @ NeurIPS) · backlist 2026-05-18 · rubric 88.0

73.

1) nice video

1) nice video 2) interesting that Jane Street seems to own/operate this DC themselves. Strong data privacy needs? 3) JS is now a big AI compute user overall. They recently ordered $6B of compute from CoreWeave, order of $1B/year, comparable

by @justjoshinyou13 (Josh You) · backlist 2026-05-18 · rubric 88.0

74.

just made `helix chef`.

just made `helix chef`. it just one shot a memory system running on helix.

by @xav_db (Xav) · backlist 2026-05-18 · rubric 87.0

75.

Hey everyone! Good news: we've fixed the "conversation memory loss" issue of OpenAgents Workspace!

Hey everyone! Good news: we've fixed the "conversation memory loss" issue of OpenAgents Workspace! What we fixed: Context no longer drops in multi-turn conversations The Agent can now properly remember and reference previous messages Multi

by @OpenAgentsAI (OpenAgents) · backlist 2026-05-18 · rubric 87.0

76.

Many people are worried that AI agents are going to differentially underperform on safety research (even if they'…

Many people are worried that AI agents are going to differentially underperform on safety research (even if they're not scheming) because (i) RL generalizes poorly to hard-to-verify tasks and (ii) AI safety research is harder to verify than

by @tomekkorbak (Tomek Korbak) · backlist 2026-05-18 · rubric 87.0

77.

if you can’t find affordable 8xH100 these days, don’t worry. You can just synthetically train on them inside of a…

if you can’t find affordable 8xH100 these days, don’t worry. You can just synthetically train on them inside of a world model.

by @fujikanaeda (Eric W. Tramel) · backlist 2026-05-18 · rubric 86.0

78.

1/ Language models have been stuck in discrete space while vision models ride the continuous diffusion wave. Why?…

1/ Language models have been stuck in discrete space while vision models ride the continuous diffusion wave. Why? We assumed text inherently needed discrete diffusion. A new MIT paper proves this assumption is mathematically wrong.

by @che_shr_cat (Grigory Sapunov) · backlist 2026-05-18 · rubric 86.0

79.

cursor is at frontier scale, both in terms of performance and compute

cursor is at frontier scale, both in terms of performance and compute if composer 2.5's budget was put into a pre-train: ~6.3T total, 200B active trained on ~56T tokens if composer 3 allocates 50% of the budget to pre-training: ~500B acti

by @eliebakouch (elie) · backlist 2026-05-18 · rubric 86.0

80.

it hugely improves coherence + understanding especially across multiple compaction windows and helps future itera…

it hugely improves coherence + understanding especially across multiple compaction windows and helps future iterations understand which parts of the code and spec are "carefully thought thru + decided" vs just "yeah this is what happened to

by @tenobrus (Tenobrus) · backlist 2026-05-18 · rubric 86.0

81.

gpt 5.5 in the linux VM by (x.com)

gpt 5.5 in the linux VM by @asciidotdev used computer use to find my French family's lost Polish roots in old books he transcribed shit like this perfectly and we're now digging 6 generations, across many regions, already going back to t

by @AniC_dev (Anicet) · backlist 2026-05-18 · rubric 86.0

82.

The hardest problem in AI agents may no longer be intelligence.

The hardest problem in AI agents may no longer be intelligence. It’s coordination. Multi-agent systems are failing 41–87% of the time — mostly from coordination breakdowns, not model weakness. which means: the next infrastructure layer

by @ZaiforStartups (Z.ai for Startups) · backlist 2026-05-18 · rubric 86.0

83.

crazy to see that video inference requests have already grown 4x in little over a month -- my prediction is that …

crazy to see that video inference requests have already grown 4x in little over a month -- my prediction is that multimodal inference is going to be WAY larger than text-based inference on venice, especially when they enable TEE/E2EE modes

by @nikshepsvn (nikshep) · backlist 2026-05-18 · rubric 86.0

84.

https:// (t.co)

https:// arxiv.org/abs/2605.16147 It's interesting that DiT does not have outlier tokens (maybe because of noise it would be hard to anchor on specific tokens?) but still register tokens are beneficial, especially for pixel-level models.

by @rosinality (Rosinality) · backlist 2026-05-18 · rubric 86.0

85.

a litmus test i’ve been thinking about for continual learning is bounding lifetime retrieval count per fact. a mo…

a litmus test i’ve been thinking about for continual learning is bounding lifetime retrieval count per fact. a model should use tools to look things up, but gradually compound fuzzy memories of things they’ve searched, and eventually not ne

by @willccbb (will brown) · backlist 2026-05-18 · rubric 84.0

86.

very nice write-up on preventing reward hacking by designing the verifier and network boundaries clearly

by @guohao_li (Guohao Li ) · backlist 2026-05-18 · rubric 84.0

87.

The (x.com)

The @cursor_ai team shipped Composer 2 and now Composer 2.5 on the same Kimi K2.5 base model. Performance benchmarks are. Frontier quality and open-source economics. 85% of the compute powering these gains came from RL. Fireworks powers

by @FireworksAI_HQ (Fireworks AI) · backlist 2026-05-18 · rubric 84.0

88.

Amazing benchmark numbers, but what stood out to me most is the feel in daily use.

Amazing benchmark numbers, but what stood out to me most is the feel in daily use. Clearer turn summaries, easier-to-follow edits, and code that feels like something I’d write myself.

by @ajhofmann (Adam Hofmann) · backlist 2026-05-18 · rubric 84.0

89.

this is most visceral with anything multimodal. the agents cant into visual feedback loops

by @kalomaze · backlist 2026-05-18 · rubric 84.0

90.

Very rarely you stumble on a method that's simple, obvious in hindsight, free, and touches on every problem you c…

Very rarely you stumble on a method that's simple, obvious in hindsight, free, and touches on every problem you care about: CLI agents, continual learning, self-improvement, world models. ECHO is one of those

by @DimitrisPapail (Dimitris Papailiopoulos) · backlist 2026-05-18 · rubric 84.0