Backlist — 03 Jun 2026 UTC

1.

Bend-to-C compiler refactor passes 1,016 generated programs

A compiler refactor was stress-tested overnight against a Haskell reference and ran the suite about 6x faster than GHC on one core

by @VictorTaelin (Taelin) · backlist 2026-06-03 · rubric 88.0

2.

APB-Display quantifies 100k protein variants in under 3 days (x.com)

Massively parallel in vitro biochemistry can now measure binding constants for over 100,000 protein variants fast enough to close the loop on protein design

by @fordycelab (Polly Fordyce) · backlist 2026-06-03 · rubric 78.0

3.

Malformed AS_PATHs can bypass ASPA unless you enforce First AS

BGP hijacks can still pass newer validation schemes when networks fail to enforce that the first AS in a received route is the neighbor that sent it

by @next_hopself (Bryton Herdes) · backlist 2026-06-03 · rubric 84.0

4.

Elixir 1.20 released with gradual typing

Elixir now type-checks every line for bugs and dead code without requiring type signatures, moving a dynamic language toward gradual typing with low false positives

by @josevalim (José Valim) · backlist 2026-06-03 · rubric 63.0

5.

Quack server adds multi-client access to DuckDB

DuckDB is already multithreaded, and Quack adds the missing server layer so multiple clients can write to the same DuckDB database

by @HoytEmerson (Hoyt Emerson) · backlist 2026-06-03 · rubric 66.0

6.

One-click GitHub token theft via a VS Code bug (t.co)

A VS Code issue enabled a one-click path to stealing GitHub tokens, showing how editor integrations can become credential-exfiltration surfaces

by @blackorbird · backlist 2026-06-03 · rubric 78.0

7.

UK orders Google to let publishers opt out of AI features without leaving Search (x.com)

Publishers in the UK will get a way to refuse Google’s AI features while remaining indexed in normal search, breaking the previous all-or-nothing bargain

by @timatbyteful (Timur Gok) · backlist 2026-06-03 · rubric 17.0

8.

Distributed Polars runs on Kubernetes

Polars now has a horizontally scalable distributed engine that can run on self-managed Kubernetes while preserving the familiar Polars API

by @RitchieVink (Ritchie Vink) · backlist 2026-06-03 · rubric 48.0

9.

CIFSwitch: a 19-year-old Linux logic bug for root escalation

An old CIFS authentication-key logic flaw lets unprivileged users forge keys and escalate to root through malicious NSS modules on major Linux distributions

by @DFIR_Radar (DFIR Radar) · backlist 2026-06-03 · rubric 46.0

10.

The unreasonable redundancy of nature’s protein folds

Natural protein sequence space contains enough fold redundancy that generating more sequences is often less useful than understanding which folded structures are actually distinct

by @ArdaGoreci (Arda Göreci) · backlist 2026-06-03 · rubric 48.0

11.

Why Muon applies momentum before orthogonalization (t.co)

Momentum can act as a spectral filter on matrix-valued gradients, making the subsequent orthogonalization step in Muon more reliable

by @XianliangLi910 (Xianliang Li) · backlist 2026-06-03 · rubric 86.0

12.

The case for space datacenters

Space datacenters become a cost question once terrestrial power, land, cooling, and chip-production constraints are modeled against launch and orbital operations

by @SemiAnalysis_ (SemiAnalysis) · backlist 2026-06-03 · rubric 24.0

13.

German police bought app-derived location data without warrants

Commercial phone-location datasets let German state police track devices outside normal warrant processes, exposing a surveillance loophole in the data-broker economy

by @IntCyberDigest (International Cyber Digest) · backlist 2026-06-03 · rubric 72.0

14.

Elysia build plugin moves route work from request time to build time

Precomputing Elysia route manifests during builds removes runtime JIT overhead and changes cold-start behavior for server-side TypeScript apps

by @saltyAom (SaltyAom) · backlist 2026-06-03 · rubric 84.0

15.

Why Gaussian diffusion models fail on text data

Discrete-like latent spaces are a poor fit for continuous Gaussian diffusion, explaining why common heuristics such as self-conditioning help text generation

by @a__shabalin (Alexander Shabalin) · backlist 2026-06-03 · rubric 66.0

16.

DeepProve is open source (t.co)

DeepProve now ships as a modular open-source repo that can benchmark and extend proof systems across safetensors, GGUF, and ONNX model formats

by @nikkolasg1 (nikkolasg) · backlist 2026-06-03 · rubric 72.0

17.

Interactive latent-space demo: click a UMAP point, generate a mesh

A browser demo turns latent vectors into live neural-SDF meshes and lets users interpolate between objects, making representation geometry tangible

by @webgl_webgpu (WebGL / WebGPU) · backlist 2026-06-03 · rubric 48.0

18.

Benchmark raises $2B, including its first growth fund

Benchmark’s first growth fund marks a break from decades of defending a smaller, focused venture model

by @KateClarkTweets (Kate Clark) · backlist 2026-06-03 · rubric 24.0

19.

Ramp launches Stack, an AI operating system for accounting firms

Stack turns accounting-firm playbooks into auditable SOPs that can run closes, reconciliations, and journal entries as the profession faces a severe labor shortage

by @eglyman (Eric Glyman) · backlist 2026-06-03 · rubric 58.0

20.

Microsoft built a VBS enclave into Edge and used it for static config

Edge has a hardware-backed enclave capable of protecting data from kernel drivers, yet it was apparently not used for the obvious high-value target of stored passwords

by @yarden_shafir (Yarden Shafir) · backlist 2026-06-03 · rubric 72.0

21.

Michigan-built drone motor uses no rare earths

A 130g, 360W peak drone motor with a fully American supply chain points at how defense robotics bottlenecks are shifting into component manufacturing

by @aphysicist (Aaron Slodov) · backlist 2026-06-03 · rubric 26.0

22.

Partners Group gates a retail private-equity fund

Retail investors are now trying to exit private equity as well as private credit, challenging the assumption that illiquid retail funds would behave differently

by @junkbondinvest (junkbondinvestor) · backlist 2026-06-03 · rubric 18.0

23.

Document parsers can reveal bad redactions

Real-world redactions often leave recoverable text behind, and stronger document parsing models make those failures visible instead of merely cosmetic

by @hu_yifei (Yifei Hu) · backlist 2026-06-03 · rubric 62.0

24.

Cerebras and the economics of wafer-scale chips (x.com)

Cerebras shows how a nonstandard chip geometry can trade networking limits for extremely low-latency inference on large models

by @SamoBurja (Samo Burja) · backlist 2026-06-03 · rubric 21.0

25.

Why Daytona chose AGPL to avoid the Elastic problem (x.com)

AGPL lets enterprises run software internally while preventing cloud providers from offering a closed competing service without sharing their changes

by @ivanburazin (Ivan Burazin) · backlist 2026-06-03 · rubric 29.0

26.

BEV-Patch-PF: GPS-free off-road geolocalization (x.com)

Off-road robots need localization when odometry drifts and GPS fails, so BEV-Patch-PF matches onboard views to satellite imagery in unstructured terrain

by @rwik_rana (Rwik Rana) · backlist 2026-06-03 · rubric 58.0

27.

NeurIPS position papers and Pangram scores (t.co)

A large share of submitted NeurIPS position papers scored highly on an AI-writing detector, raising a concrete governance problem for scholarly venues

by @DavidThorstad (David Thorstad) · backlist 2026-06-03 · rubric 58.0

28.

Compromised npm packages abuse Hugging Face as exfil infrastructure

Malicious npm packages deployed a RAT that captured keystrokes, screenshots, and wallet credentials while using Hugging Face repositories as infrastructure

by @vxunderground (vx-underground) · backlist 2026-06-03 · rubric 78.0

29.

AI-generated prose as a coordination problem

When more online writing is machine-generated, readers experience not just lower quality but a breakdown in the implicit social contract of communication

by @andy_matuschak (Andy Matuschak) · backlist 2026-06-03 · rubric 14.0

30.

A personal archive of 2,000 IBM pins (t.co)

A single collector documented thousands of IBM pins, preserving a surprisingly rich material history of corporate computing culture

by @haris_chc (haris) · backlist 2026-06-03 · rubric 4.0

31.

MACU is simple and general: a manager decomposes tasks into a directed acyclic graph (DAG), dispatches parallel s… (t.co)

MACU is simple and general: a manager decomposes tasks into a directed acyclic graph (DAG), dispatches parallel subagents, and revises the DAG with new findings. A single slow CUA → a team of CUAs working in parallel! Interactive visualiz

by @kohjingyu (Jing Yu Koh) · backlist 2026-06-03 · rubric 90.0

32.

Computer use agents are slow and brittle. The fix isn’t just stronger models, but also deploying them as multi-ag…

Computer use agents are slow and brittle. The fix isn’t just stronger models, but also deploying them as multi-agent systems. MACU is a general Multi-Agent Computer Use framework that consistently lifts success rates by 3.4-25.5% and is up

by @kohjingyu (Jing Yu Koh) · backlist 2026-06-03 · rubric 88.0

33.

MACU achieves better scaling behavior than single-agent CUAs, and improves success rates consistently across four…

MACU achieves better scaling behavior than single-agent CUAs, and improves success rates consistently across four CUA benchmarks (+4.7% on OSWorld, +3.4% on Online-M2W, +8.7% on WebTailBench, +25.5% on Odysseys). MACU also reduces the wall

by @kohjingyu (Jing Yu Koh) · backlist 2026-06-03 · rubric 86.0

34.

hey (x.com)

hey @willahmed , i found some bugs in how whoop advanced labs calculates/imports some of the biomarkers. 1. atherogenic index of plasma (AIP) is defined as: AIP = log10(triglycerides / HDL-C) the ratio must use molar units (mmol/L), but w

by @banteg · backlist 2026-06-03 · rubric 86.0

35.

Modded-NanoGPT optimization result #29 (2026/05/14): (x.com)

Modded-NanoGPT optimization result #29 (2026/05/14): @eliebakouch has achieved a new step-count record of 2930 via the following techniques: - Add Aurora to mlp.proj - Warmup & cooldown Muon mu - Disable SoftMuon & NorMuon - Extend Contra

by @kellerjordan0 (Keller Jordan) · backlist 2026-06-03 · rubric 84.0

36.

Most autoresearch emulate an individual researcher.

Most autoresearch emulate an individual researcher. We created #SimpleTES to emulate a research community. The result: new SOTA discoveries across 21 open science problems, including More efficient astrodynamics 2× faster LASSO Better

by @james_y_zou (James Zou) · backlist 2026-06-03 · rubric 84.0

37.

Quantized JetBrains Mellum2-12B-A2.5B-Thinking to MXFP4 for Apple Silicon.

Quantized JetBrains Mellum2-12B-A2.5B-Thinking to MXFP4 for Apple Silicon. 12B MoE / 2.5B active, fits in 6.2 GB on disk and 7 GB peak memory. On M5 Pro: - Decode 130 tok/s - MATH-500 80% - HumanEval 93% - MMLU 90% Needs the open mlx-lm

by @ChachraSahil (Sahil Chachra) · backlist 2026-06-03 · rubric 84.0

38.

Got it down by another millisecond!

Got it down by another millisecond! 6ms per 1080p frame on a single core is insanely fast, and I suspect this is very close to the optimum (famous last words, probably)

by @vanilagy (Vanilagy) · backlist 2026-06-03 · rubric 84.0

39.

Amazing work led by (x.com)

Amazing work led by @GhxIsaac ! Deep research agents have selective memory: they 𝘀𝘁𝗮𝗿𝗲 at the 𝗯𝗲𝗴𝗶𝗻𝗻𝗶𝗻𝗴 and the 𝗲𝗻𝗱 of long-horizon trajectories, then 𝗴𝗵𝗼𝘀𝘁 the 𝗺𝗶𝗱𝗱𝗹𝗲. We turn this into a map for when 𝗰𝗼𝗻𝘁��

by @yuz9yuz (Yu Zhang) · backlist 2026-06-03 · rubric 84.0

40.

We worked with (x.com)

We worked with @trajectorylabs to run their SDPO++ algorithm on APEX-Agents and see what it could do with real production data. Pass rates went from 5% to 25% on GPT-OSS-120B, and the curve is still climbing. Read more about our work to

by @mercor_ai (Mercor) · backlist 2026-06-03 · rubric 84.0

41.

We built a simulator to understand the performance of (x.com)

We built a simulator to understand the performance of @tensorlake 's sandbox scheduler and dataplane during sandbox creation bursts. We can safely simulate traffic bursts without spinning up 100s of very expensive machines. Google talk

by @diptanu (Diptanu Choudhury) · backlist 2026-06-03 · rubric 83.0

42.

New work: a simple and general multi-agent computer use framework. It uses a manager to plan and re-plan by creat…

New work: a simple and general multi-agent computer use framework. It uses a manager to plan and re-plan by creating a task DAG, with subagents for parallel execution. It improves success rate across benchmarks, and substantially improves

by @dan_fried (Daniel Fried) · backlist 2026-06-03 · rubric 82.0

43.

Quantized JetBrains Mellum2-12B-A2.5B-Thinking to OptiQ 5bpw mixed-precision for Apple Silicon.

Quantized JetBrains Mellum2-12B-A2.5B-Thinking to OptiQ 5bpw mixed-precision for Apple Silicon. 12B MoE / 2.5B active, 3/4/6/8-bit per layer (KL-sensitivity allocated). 12 GB on disk, 13 GB peak memory. On M5 Pro: - Decode 89 tok/s - MATH

by @ChachraSahil (Sahil Chachra) · backlist 2026-06-03 · rubric 82.0

44.

I had a lot of fun working on this paper - we found an elegant story for why subliminal learning happens!

I had a lot of fun working on this paper - we found an elegant story for why subliminal learning happens! A key intuition in interpretability is that basically every interesting phenomena in LLMs boils down to adding a steering vector. Sub

by @NeelNanda5 (Neel Nanda) · backlist 2026-06-03 · rubric 78.0

45.

These techniques were discovered by a Claude-based autoresearch harness developed by (x.com)

These techniques were discovered by a Claude-based autoresearch harness developed by @eliebakouch at @PrimeIntellect 2/2

by @kellerjordan0 (Keller Jordan) · backlist 2026-06-03 · rubric 78.0

46.

Because it's the full stack from Tensors to MMIO, the ceiling on speed in tinygrad is higher than in any other fr…

Because it's the full stack from Tensors to MMIO, the ceiling on speed in tinygrad is higher than in any other framework.

by @__tinygrad__ (the tiny corp) · backlist 2026-06-03 · rubric 78.0

47.

We are now seeking a puzzle maker to help us create puzzles that LLMs can't yet solve.

by @MechanizeWork (Mechanize) · backlist 2026-06-03 · rubric 78.0

48.

We are seeing N-year exploits for patched vulnerabilities that still have remaining exposure (e.g. keygen). The t…

We are seeing N-year exploits for patched vulnerabilities that still have remaining exposure (e.g. keygen). The theoretical knowledge is now instantly available, and the learning curve to implement them has been dramatically compressed.

by @julianor (Juliano Rizzo) · backlist 2026-06-03 · rubric 78.0

49.

Meet Gemma 4 12B Unified from (x.com)

Meet Gemma 4 12B Unified from @googlegemma ! This is a 12B dense, encoder-free multimodal that runs text, image & audio natively on-device. Day-0 support is now live in SGLang! Encoder-free architecture: raw image patches + audio wavefo

by @lmsysorg (LMSYS Org) · backlist 2026-06-03 · rubric 78.0

50.

World models are moving beyond offline generation towards interactive, real-time experiences.

World models are moving beyond offline generation towards interactive, real-time experiences. Introducing FlashDreams: an open-source high-performance inference and serving library built for autoregressive world models: Up to 3.10× faste

by @ruilong_li (Ruilong Li) · backlist 2026-06-03 · rubric 78.0

51.

we created a new, open source eval (LongArray-Extract) for one of the hardest problems in document processing: ho…

we created a new, open source eval (LongArray-Extract) for one of the hardest problems in document processing: how to extract every row out of long documents some highlights: - Extend's array extraction is SOTA (99.2%) - 3x faster than the

by @kushalbyatnal (Kushal Byatnal) · backlist 2026-06-03 · rubric 78.0

52.

M3 traffic got wild, so we shipped overnight.

M3 traffic got wild, so we shipped overnight. Inference serving upgraded at 22:00 Beijing / 7:00 AM PT. TPS much smoother now. Most users should be seeing 50–70 TPS.

by @SkylerMiao7 (Skyler Miao) · backlist 2026-06-03 · rubric 78.0

53.

yesterday I turned a 2D character into thousands of living Gaussian splats.

yesterday I turned a 2D character into thousands of living Gaussian splats. today I built an entire 2D game scene with them. trees, grass, flowers, particles, atmosphere, all made of splats. the foliage reacts as the character moves throu

by @boona11 (Ibrahim Boona) · backlist 2026-06-03 · rubric 78.0

54.

Multi-speaker Transcription: Who said What and When?

Multi-speaker Transcription: Who said What and When? On 10 real multi-speaker CHiME / NOTSOFAR meetings, Trelis edges AssemblyAI on corpus cpWER. - Same single-channel audio. - Same meeteval scoring. - No oracle speaker labels. Trelis tra

by @TrelisResearch (Trelis Research) · backlist 2026-06-03 · rubric 77.0

55.

What if physical AI policies could interact with generated worlds in real time? (t.co)

What if physical AI policies could interact with generated worlds in real time? Introducing OmniDreams, a generative world model for closed-loop autonomous vehicle simulation. Tech report, code, models, and data samples are available now

by @zianwang97 (Zian Wang) · backlist 2026-06-03 · rubric 76.0

56.

Search agents have no explicit belief state or value function. (t.co)

Search agents have no explicit belief state or value function. I think that’s why long-horizon agents degrade and test-time search saturates. A few small experiments and thoughts: https:// shreshthrajan.com/search-agents- state.html …

by @shreshthrajan (Shreshth Rajan) · backlist 2026-06-03 · rubric 74.0

57.

In early May, the best superforecasters predicted that, by the end of the year, the longest METR 80% task horizon…

In early May, the best superforecasters predicted that, by the end of the year, the longest METR 80% task horizons would reach 3-4 hours. In late May, Claude Mythos achieved that number.

by @emollick (Ethan Mollick) · backlist 2026-06-03 · rubric 74.0

58.

Today (June 3), I'll be speaking at CVPR at the Test-Time Scaling for Computer Vision WS (1:30 pm PT) about how w…

Today (June 3), I'll be speaking at CVPR at the Test-Time Scaling for Computer Vision WS (1:30 pm PT) about how we can use test-time compute to boost generalization of robot policies, room 506. Also speaking *right now* (in 5 min) in the D

by @svlevine (Sergey Levine) · backlist 2026-06-03 · rubric 74.0

59.

the grpo reward was the probability assigned by the classifier that the attack was not malicious + a bonus of the…

the grpo reward was the probability assigned by the classifier that the attack was not malicious + a bonus of the argmax was not malicious (meaning the attacker had tricked the classifier) early round the attacker does pretty well, but th

by @brendanh0gan (Brendan Hogan) · backlist 2026-06-03 · rubric 74.0

60.

running a fine-tuned LLM on my phone and beating GPT4o (the OG model) is such a great feeling.

running a fine-tuned LLM on my phone and beating GPT4o (the OG model) is such a great feeling. achieved better latency, accuracy, tool calls, and output format. 1 day to prepare dataset, 12 hrs to train, 3 hours to run evals.

by @cjzafir (CJ Zafir) · backlist 2026-06-03 · rubric 74.0

61.

Microsoft is MXC, releasing a containerization solution supporting custom policies (this is how openclaw would ru… (t.co)

Microsoft is MXC, releasing a containerization solution supporting custom policies (this is how openclaw would run), and there’s a preview on GitHub: https:// github.com/microsoft/mxc

by @IceSolst (solst/ICE of Astarte) · backlist 2026-06-03 · rubric 74.0

62.

how do you sync a trillion parameter model every RL step without a shared cluster? we just wrote a blog about it,… (x.com)

how do you sync a trillion parameter model every RL step without a shared cluster? we just wrote a blog about it, led by @AmineDirhoussi what I like the most is the way it proves you can use the Hub for basically everything → trainer on

by @SergioPaniego (Sergio Paniego) · backlist 2026-06-03 · rubric 74.0

63.

Spotted a novel covered+looped apyUSD repeg trade. Someone is: (x.com)

Spotted a novel covered+looped apyUSD repeg trade. Someone is: 1. buying discounted apyUSD 2. depositing in the @roycoprotocol apyUSD Senior Tranche, 15% minimum coverage 3. using ST-apyUSD to borrow apxUSD 4. buying more discount apyUSD

by @CometShock (@CometShock) · backlist 2026-06-03 · rubric 74.0

64.

Stronger models have made finding vulnerabilities easier, and the bottleneck has shifted to verification, triage,…

Stronger models have made finding vulnerabilities easier, and the bottleneck has shifted to verification, triage, patching. Here are some lessons from working with security teams to address the new bottlenecks.

by @eugeneyan (Eugene Yan) · backlist 2026-06-03 · rubric 74.0

65.

The fix for Meta's AI bot vulnerability was apparently:

The fix for Meta's AI bot vulnerability was apparently: - remove the feature from the UI - leave the API endpoint accessible I wish I was joking.

by @cyb3rops (Florian Roth ) · backlist 2026-06-03 · rubric 73.0

66.

Two new paper implementations just dropped on TensorTonic.

Two new paper implementations just dropped on TensorTonic. Word2Vec: subsampling, skip-gram pairs, negative sampling, SGNS loss, CBOW forward, and a full SGD training step. The paper that started the whole embeddings revolution, built from

by @TensorTonic · backlist 2026-06-03 · rubric 72.0

67.

How to manage secrets with worktrees:

How to manage secrets with worktrees: Files that are untracked in Git will NOT be copied over to new worktrees (Codex, Claude Code, & Conductor included) Claude Code introduced .worktreeinclude, which uses glob syntax to copy untracked fi

by @mattyp (matt palmer) · backlist 2026-06-03 · rubric 72.0

68.

Can LLMs reason in superposition? We introduce MUX, a method that turns text CoT into latent continuous reasoning.

Can LLMs reason in superposition? We introduce MUX, a method that turns text CoT into latent continuous reasoning. Instead of one-hot vectors as in CoT, the model now learns to predict weighted averages of several one-hot vectors, that we

by @ayhozade (Ayhan Suleymanzade) · backlist 2026-06-03 · rubric 72.0

69.

We're launching the microagi Research Fellowship.

We're launching the microagi Research Fellowship. Fellows get up to $2M in compute, robotics hardware, our evals, and one of the largest physical AI datasets ever assembled. You build in our lab, with our team, alongside partners like Unit

by @bercankilic (Bercan) · backlist 2026-06-03 · rubric 72.0

70.

MAI-Code-1-Flash hits 71.6 on SWE-Bench Verified using a third of the tokens Claude Haiku 4.5 burns.

MAI-Code-1-Flash hits 71.6 on SWE-Bench Verified using a third of the tokens Claude Haiku 4.5 burns. Benchmarks now ship on two axes : performance & the cost to get there.

by @ttunguz (Tomasz Tunguz) · backlist 2026-06-03 · rubric 72.0

71.

btw i have not dug into this but seems the claude sdk is reporting 1hr cache writes by default, for some cursed r…

btw i have not dug into this but seems the claude sdk is reporting 1hr cache writes by default, for some cursed reason (not warden) pi uses the normal 5m default if accurate this would explain some of the sonnet delta can check after work

by @thomasmustier (Thomas Mustier) · backlist 2026-06-03 · rubric 72.0

72.

New GhostBeacon tool identifies rogue and hidden Wi-Fi access points by analyzing beacon frames, signal strength,…

New GhostBeacon tool identifies rogue and hidden Wi-Fi access points by analyzing beacon frames, signal strength, uptime, and encryption patterns. Reveals how evil twin attacks exploit 802. #DFIR_Radar

by @DFIR_Radar (DFIR Radar) · backlist 2026-06-03 · rubric 72.0

73.

This is the architecture of a single RLM forward pass... One user message in one response out. How would a RLM ag…

This is the architecture of a single RLM forward pass... One user message in one response out. How would a RLM agentic chat harness look like?

by @neural_avb (AVB) · backlist 2026-06-03 · rubric 72.0

74.

It's interesting to see (x.com)

It's interesting to see @MicrosoftAI uses ray actors not just for controller and rollout workers but problem workers for the posting training of the MAI-Thinking-1 model. Instead of introducing third party dependency like @modal for san

by @xinyzng (Xinyu Zhang) · backlist 2026-06-03 · rubric 72.0

75.

University of Toronto researchers claim to have developed a "worm" powered by open source AI that exploits known … (x.com)

University of Toronto researchers claim to have developed a "worm" powered by open source AI that exploits known flaws and tailors attacks for each computer ( @cademetz / New York Times) (Visit Techmeme dot com for the link and full conte

by @Techmeme · backlist 2026-06-03 · rubric 72.0

76.

Thrilled to release the first LLM persuasion benchmark with user personas in our paper: Ψ-Bench: Evaluating Perso… (t.co)

Thrilled to release the first LLM persuasion benchmark with user personas in our paper: Ψ-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues! Paper: https:// arxiv.org/pdf/2606.02754 Code: https:// github.com/Hanpx

by @peixuanhakhan (Peixuan Han) · backlist 2026-06-03 · rubric 72.0

77.

I made a tweet earlier claiming `HttpService:RequestAsync` quietly discards the `Authorization` header if you use…

I made a tweet earlier claiming `HttpService:RequestAsync` quietly discards the `Authorization` header if you use a `string` instead of the `Secret` datatype in live game servers. I appear to have been mistaken, it was actually a DDoS prot

by @MaximumADHD (Max ¯\_(ツ)_/¯) · backlist 2026-06-03 · rubric 72.0

78.

hybrid local-cloud inference ftw ! (x.com)

hybrid local-cloud inference ftw ! @JonSaadFalcon and i been studying this for a hot sec (minions, ipw, openjarvis). link to our papers in comments below

by @Avanika15 (Avanika Narayan) · backlist 2026-06-03 · rubric 71.0

79.

I noticed (x.com)

I noticed @perplexity_ai Comet only route your query to google or perplexity if your query is in English and default to google for languages like Chinese. As a Chinese speaker I developed my own router that routes my queries for me and it

by @kenwuuuu (Ken Wu) · backlist 2026-06-03 · rubric 71.0

80.

You can already read (x.com)

You can already read @huggingface datasets directly in @DataPolars but not (yet!) from Buckets (HF's S3 alternative, great for private and working data). So I built a plugin to read + write Buckets straight from Polars:

by @vanstriendaniel (Daniel van Strien) · backlist 2026-06-03 · rubric 70.0

81.

Run Polars' distributed engine on your own infrastructure.

Run Polars' distributed engine on your own infrastructure. Deploy a distributed Polars cluster on any Kubernetes setup (EKS, AKS, GKE, or minikube) and get a query dashboard with past queries, advanced query profiling, Open-lineage support

by @DataPolars (polars data) · backlist 2026-06-03 · rubric 70.0

82.

The recording from our talk: "From Responses To Trajectories: Multi-Turn and Multi-Environment RL" from (x.com)

The recording from our talk: "From Responses To Trajectories: Multi-Turn and Multi-Environment RL" from @PyTorch Conf Europe is live! @krasul and I covered the latest advances in multi-turn GRPO in TRL: trajectories, tool use, envs, an

by @SergioPaniego (Sergio Paniego) · backlist 2026-06-03 · rubric 69.0

83.

Most rewarding work of my life! (t.co)

Most rewarding work of my life! I was part of our amazing data team. My mission was to curate all the STEM knowledge from the web to get a strong pre-trained checkpoint that could climb in RL. More details in Appendix A of our comprehensiv

by @mcaralt1 (mcaralt) · backlist 2026-06-03 · rubric 69.0

84.

Building a CLI that works for agents as well as humans requires a few UX choices:

Building a CLI that works for agents as well as humans requires a few UX choices: - machine readable errors and exit codes matter - detect whether there is a tty and choose JSON or text automatically - add an explicit --confirm flag for mu

by @davidmytton (David Mytton) · backlist 2026-06-03 · rubric 68.0

85.

Building momentum at Marin! Upgrading from Dense -> 129B parameter MoEs -> architecture improvements -> optimizer… (t.co)

Building momentum at Marin! Upgrading from Dense -> 129B parameter MoEs -> architecture improvements -> optimizer improvements gives our pretraining recipe an estimated 6x cumulative learning speedup, accounting for MFU. Includes community

by @classiclarryd (Larry Dial) · backlist 2026-06-03 · rubric 68.0

86.

Tool calls are just API wrappers, to be honest, not completely true.

Tool calls are just API wrappers, to be honest, not completely true. Although the most common use case is to call a search engine, hit a database, or fetch a URL. That framing is too narrow, and it limits how you design agents. At the end

by @arpit_bhayani (Arpit Bhayani) · backlist 2026-06-03 · rubric 68.0

87.

surprisingly, the mai-thinking-1 tech report includes lots of details on pre-training and rl data, training recip…

surprisingly, the mai-thinking-1 tech report includes lots of details on pre-training and rl data, training recipes, training infrastructure, data pipelines, and ablation experiments. added to my flight reading list

by @guohao_li (Guohao Li ) · backlist 2026-06-03 · rubric 68.0

88.

Okay, I can’t believe I’m saying this, but it boots, my own completely custom operating system boots!!! (x.com)

Okay, I can’t believe I’m saying this, but it boots, my own completely custom operating system boots!!! You can see more about this journey in the tweet below. Started in Codex with /goal on May 4th. Totally wild. Surreal feeling right n

by @morganlinton (Morgan) · backlist 2026-06-03 · rubric 68.0

89.

Can reasoning models become overly reliant on chain-of-thought examples? (t.co)

Can reasoning models become overly reliant on chain-of-thought examples? Our #ACL2026 work shows excessive CoT supervision is not always beneficial, and gives a recipe for tuning the CoT fraction to improve novel-task accuracy. Website:

by @kvignesh1420 (Vignesh Kothapalli) · backlist 2026-06-03 · rubric 67.0

90.

Browser progress. Now you can open "Remote tabs" that run in Cloudflare Browser Run instances.

Browser progress. Now you can open "Remote tabs" that run in Cloudflare Browser Run instances. Right click the tab to get a shareable CDP URL where you can hand off to your agent and watch it do things on the website for you (like fill out

by @BraydenWilmoth (Brayden) · backlist 2026-06-03 · rubric 67.0