Google and Blackstone plan a 500 MW TPU cloud
Google is turning TPUs into an external cloud business with Blackstone capital, 500 MW planned for 2027, and a new operator outside GCP
Top 90 curated tweets ranked for substance on 19 May 2026 UTC.
Google is turning TPUs into an external cloud business with Blackstone capital, 500 MW planned for 2027, and a new operator outside GCP
Anthropic, Google, Meta, and OpenAI let METR test internal models with chain-of-thought access and review non-public evidence about agent control risks
Carbon-3B was trained on 1T DNA tokens and claims leading DNA-model performance with inference fast enough to generate a whole human genome on a laptop
A purpose-built operating system let researchers probe Apple Silicon branch predictors and observe effects like phantom fetches that ordinary software cannot expose
Vitest’s API and UI modes exposed arbitrary files, allowed arbitrary execution, and had an otelCarrier XSS bug, making the update immediately relevant to many JS projects
Prime Intellect released a fully synthetic agent task corpus that grows harder over time, spanning 4,504 tool-use tasks, 1,040 domains, and 8,159 tools
Attackers compromised an antv maintainer account and published malicious versions of widely used npm packages, extending the Shai-Hulud-style supply-chain pattern
On a months-long AI R&D task, Codex, Claude Code, and Autoresearch mostly tuned hyperparameters and recovered only 9.3% of human progress
CXMT’s filing implies a massive Chinese memory business with Hynix-like margins, high utilization, and LPDDR-heavy revenue despite no disclosed HBM
The BEA argues standard statistics overstate healthcare inflation and miss productivity gains from treatments that extend healthy life
Validation tooling built for Turso uncovered more than ten bugs in SQLite, showing how formal models can improve even mature database systems
A Nature Methods paper finds deep-learning approaches to gene perturbation effect prediction do not yet beat simple linear baselines
A humanoid robot is shown translating external voice commands into varied real-time actions without pre-scripted motion playback
MRT Explorer lets operators inspect BGP routing update files directly in the browser for outage and route-leak investigations
The main Rust gRPC implementation is moving into the gRPC project, reducing ecosystem fragmentation for production Rust services
tinygrad now has instruction selection and register allocation for an x86 assembly backend, making generated kernels visible and optimizable below LLVM/PTX
Ghuloum’s 2006 paper teaches compiler building by starting with a tiny working compiler and extending it step by step instead of front-loading hundreds of lines of machinery
SimDist turns large-scale simulated experience into reusable world-model priors so robots can adapt faster on contact-rich real-world tasks
A 16-layer 3D DRAM paper points toward new memory-density approaches as AI demand strains conventional DRAM capacity
Apple is using Vision Pro’s precision eye tracking as an input method for compatible power wheelchair drive systems
Google’s Co-Scientist work has moved into a peer-reviewed Nature publication and is being made available through Gemini for Science
Parallel is building a platform where content owners can see how agents use their work and earn revenue from that usage, with partners including The Atlantic, Fortune, PitchBook, and ZoomInfo
A free archive bundles more than 1,300 public-domain landscape images as individual downloads or a full zip, pushing back on paid repackaging of open material
The Shader Sweden site commits to a full retro-computing concept while using modern WebGPU scene transitions and scroll-driven rendering
A minimal C++ program opens a path into loaders, runtimes, linking, startup code, and the hidden machinery behind “hello world”
Starting June 1, travelers can clear airport security off-site in Framingham and be dropped at Logan already beyond TSA for $9 each way
A new paper reports that access to broadband mobile phone networks reduced in-person teen socializing, decreased teen fertility, and increased teen suicide
Vercel is changing CDN pricing to reduce surprise bills from viral traffic without routing users onto slower paths or lower-priority network tiers
If AI already consumes 52% of DRAM wafer capacity this year and 69% next year, memory fabrication may constrain scaling before logic fabs do
Amazing to see what the @Hippocratic AI team is achieving with MAX. Their Polaris agent runs patient care conversations and needs to complete every turn in under 800ms, with safety models analyzing in parallel.
very good read on making models learn terminal/env dynamics!! 1. the authors add a CE loss on env output tokens alongside the GRPO loss on actions. 2. the model is trained to predict what the terminal will return, which forces the weights
#CVPR2026 Can frontier LLMs write PhD-level 3D vision code? We introduce GeoCodeBench, a benchmark that asks models to read real 3D geometric vision papers and implement core functions. Best result so far: GPT-5 reaches only 36.6%. This
https:// arxiv.org/abs/2605.15220 Using LoRAs for determining dataset mixture. For a continual training setup, when new datasets are introduced, it is possible to train LoRAs for them and combine them with a LoRA on previous datasets.
Unsloth Studio now has auto speculative decoding & MTP support for GGUFs! Get up to 2x faster inference with no accuracy loss! We ran many experiments from small models to MoEs, and optimized the params for Mac, GPUs & CPUs. There's also
By far the most impactful low hanging fruit for auto research type setups would be to find a setup that makes PPO or OPSD broadly work / stable Whether or not models are ready to make eureka level breakthroughs, this should be in reach. Th
Blackstone announced a joint venture with @Google to create a new TPU cloud. We see a generational opportunity to invest at scale in AI infrastructure and help meet the unprecedented demand for compute. More: https:// bit.ly/4uY936w
We added RTX PRO 6000 Blackwell to Jarvislabs this week. I was curious about one thing: can this make 30B-class inference simpler? So our team benchmarked Qwen3-32B on vLLM across BF16, FP8, and NVFP4. NVFP4 is NVIDIA’s new 4-bit floatin
1/4 New paper with @weijie444 ! We introduce a symmetry-compatible principle for LLM optimizer design and, as a byproduct, get an end-to-end layerwise optimizer stack where every major matrix-valued parameter (embeddings, LM heads, SwiGLU
We are top growth dog! And we are hiring distributed compute, SRE, and infrastructure engineers to work on the coolest and most challenging inference problems
Gemini 3.5 flash + Gemini managed agents api just audited a real megatron-lm ci failure inside Eigent. root cause in minutes! watch the handoff: coordinator agent plans the audit, developer agent loads the ml-failure-audit skill and gather
Salesbench is a very nice multi-agent negotiation environment, and the blog post contains detailed experiments, written down concisely and easy to understand. You should read it!
Cerebras sets a new record: a one trillion parameter model @ 1,000 tokens/s
Our report focuses on risks from AI agents intentionally causing harm within an AI company. We highlight 6 key findings that span “means” (what harmful actions agents could take), “motive” (why they might try), and “opportunity” (whether at
We’re releasing Nemotron-Labs-Diffusion - the first Tri-mode LM family (3B/8B/14B) that switches between Autoregressive, Diffusion, and Self-Speculation decoding by simply changing the attention pattern/mask. One model Three decoding modes
Scaling evaluations—not just compute—is critical for AI-driven science. SimpleTES introduces a new framework to scale discovery loops, finding new SOTA solutions across 21 open science problems. Including: • >2× faster LASSO algorithm •
bullish on LangChain Labs. imo initiatives like this are important because continual learning for agents is fundamentally an infrastructure problem...agents need systems that can collect trajectories, extract learning signal from behavior,
FutureSim Update We evaluated Opus 4.7 at max reasoning in Claude Code. Despite potential test-set contamination with knowledge cutoff of Jan '26, it scored just 21%, barely edging past Opus 4.6 and still behind GPT 5.5! Will Mythos
All Firewall mitigations are now fully free on @vercel . Not just DDoS and system-level mitigations, but also any rule you configure. Vercel now absorbs the computational and network costs of any size of attack or traffic mitigation for y
Your agent finished the task. Did it also read files it shouldn't have, call tools outside policy, or leak data across components? If you only score final outputs, you can't tell. 𝐇𝐚𝐫𝐧𝐞𝐬𝐬𝐀𝐮𝐝𝐢𝐭 evaluates the three safety layers
Aware of the login and auth issues people are having with Antigravity. Facing a significant increase in traffic and thundering herd issues. Fixing ASAP!
GDM finally manage to run OSWorld!
Google’s new Gemini 3.5 Flash is the clear leader on the Intelligence vs Speed Pareto frontier and makes large gains on GDPval-AA (real-world agentic tasks), but is 5x the cost of Gemini 3 Flash @GoogleDeepMind gave us pre-release access
Meet Gemini 3.5 Flash — our strongest agentic and coding model yet. It delivers frontier-level performance at 4x the speed of comparable frontier models — often at less than half the cost. Generally available, starting today. #GoogleIO
We’re opening up a new role at Abundance: Head of Data Engineering. The role is simple to describe and hard to do: build the data layer that lets AI agents reason across messy, high-stakes financial and domain specific data. If you’ve wo
Excited to announce an open-sourcing webui to experiment w/ steering vectors! Works OOTB w/ Gemma 26B A4B and Gemma 4B E4B (for smaller setups), and comes w/ 13 pre-built steering vectors, and lets you build your own - see a demo video belo
DeMix targets the weak point in data mixture search: proxy fidelity on hard capabilities. Instead of training a proxy for every sampled ratio, DeMix trains component models once, then uses weighted model merging to synthesize proxies for a
Anthropic announced self-hosted sendboxes and MCP tunnels for Claude Managed Agents during its "Code with Claude" event in London. > With self-hosted sandboxes, you keep sensitive files, packages, and services in your own infrastructure or
"We’re good at evaluating the models we have. We’re much worse at evaluating the models we’re about to build — especially if they cross into a new capability regime. We will have self-evolving models, but before that, we need self-evolving
For years, AI safety has been about the model: alignment, refusal training, jailbreak resistance. When you deploy an agent in 2026, the model is not making most of the consequential decisions. The harness is. It chooses which tools the mode
Excited to launch Claude Managed Agents on Cloudflare today! - Run sandboxes as microVMs or even lighter-weight isolates on CF - Zero-trust creds injection, custom egress proxies, better observability, private services via VPC - Agent Emai
Excited to share our new paper: Continuous Diffusion Scales Competitively with Discrete Diffusion for Language We introduce RePlaid , a continuous diffusion language model (DLM) with Discrete likelihood bound Scaling laws competitive with
Forward-Learned Discrete Diffusion: Learning how to noise to denoise faster
i did an experiment a while back, codex was able to build a zero dependency reverse proxy with http 1/2/3 support that's faster that the cloudflare rust one and nginx in golang over a weekend and improve it in a auto research style loop.
Excited to share our new paper MIXSD: Mixed Contextual Self-Distillation for Knowledge Injection Supervised fine-tuning is the common way to teach LLMs new knowledge, but it often catastrophically forgets existing capabilities. We introduc
we document the internal-external capabilities gap, demonstrate AI systems' spike on “hill-climbable” tasks, investigate performance on somewhat more open-ended tasks, and much more besides.
okay this is basically it. does this work? if it does, this is the general consumer OpenClaw moment. personal cloud agent with persistent context and access. this is absolutely the kind of thing google *could* potentially pull off. but is
It goes without saying but although OPSD is great, I think making PPO or some minor/reasonable variant work is by far the best bet
Introducing Carbon a family of open generative DNA foundation models. Carbon-3B matches Evo2-7B while running 250x faster at inference. It can generate new DNA sequences and score the functional impact of mutations, zero-shot. We borrowed
so good to see more local model builders getting their hands on NVIDIA DGX Spark. Laguna XS.2 already runs on DGX Spark. you can run XS.2 through vLLM, SGLang, and Ollama today, with TRT-LLM support coming soon. If you have one, go try i
We’ve added two security improvements to Claude Managed Agents. Self-hosted sandboxes keep the agent’s execution environment in your infrastructure or with a managed sandbox provider. MCP tunnels let the agent connect to services inside
oMLX 0.3.9rc1 released. Highlights: - Low-memory Macs stay stable instead of getting killed by the OS - DFlash bumped to v0.1.7 (thanks to @bstnxbt 's dflash-mlx). Qwen thinking/GDN fix, Etc. - Chunked prefill. A long prompt no longer bloc
Frontier VLMs can be jailbroken by making them recover unsafe intent from visual context! Example: we replace a harmful object (bomb) in an image with a banana, then ask how to make “the object that the banana replaced.” @GeminiApp compl
> Zero-trust creds injection, custom egress proxies, better observability, private services via VPC speaking my language!
no, it's still very important. just tell the agent to specify in detail HOW and WHY the bug happened. this has 2 benefits: 1. the agent has more context if it needs to fix the bug 2. the agent is allowed to ignore the instruction if conditi
across about 100 open PRs, robobun/claude is gradually realizing we ported bun to rust and rewriting them XML parser is one of those PRs
big day of building today We’re now doing RL training on the runtime of our new agent framework The implementation is a loop: run the native agent runtime through real and ambitious economic tasks, trace every step, score behavior agains
Congratulations Cerebras on going public last week! Artificial Analysis benchmarks were cited in Cerebras' S-1 filing regarding inference performance. We have benchmarked Cerebras’ serverless API since the day it launched in August 2024. S
OpenAI is guaranteeing compute capacity for 1-3 years.
the report is out!!!!! i want to share the spookiest transcript i read while working on this where an OpenAI model, unprompted, tried to break out of METR infrastructure ;-;
the evidence from somewhat more open-ended “challenge” problems is super interesting. one of the most capable models discovered a vulnerability that could have allowed the model to arbitrarily alter displayed transcripts and scores on METR
AI labs have started developing systems to monitor internally deployed AI agents for misaligned behavior. Earlier this year, I spent a month embedded at Anthropic stress-testing these systems, to see how easily current/future AIs could “go
We created private reports for each participating company based on our model evaluations and analysis. Participants could then approve what non-public evidence we could disclose in our public report, but had no editorial control.
You have to read this one. We just published a recap into how @wafer_ai pushed @AMD inference performance to a level that’s getting the entire ecosystem’s attention and the results are kind of wild. What makes this story interesting i
GRPO and its minor variants are just not viable. Useful baseline, that’s it. It is time to figure out how to make a real algorithm work
we just added self-hosted sandboxes to Claude Managed Agents. i've been excited about this for a while: you can now connect many more "hands" (customizable execution environments) to the agent. here's a few interesting articles covering
C2.5 is the same pretrain as C2, but powered by a much better and stronger midtrain (nearly an OOM more FLOPS)! The base model matters a ton for RL, so we're very excited for the power of Colossus 2 to push this way further
Google just showed a demo, Gemini Flash model running between 600-1400 tokens per second on TPU 8i It peaked out around 1480 tok/s, with average around 800 tok/s
absolutely no offense to stainless, but we have a generator for our sdks for each programming language we maintain (we did this before AI!!), you don't need a whole fucking company for this.
A (my) Pythia Search Engine find: https://12000. org Algebra, Mathematics, Control Systems, Signal Image Processing, Differential Equations, Simulations and more It goes deep with examples, solutions and it's very interestingly structur
Excited to share our new paper using cognitive science to distinguish AI agents and humans! We administered CogCAPTCHA30, a set of 30 cognitive tasks, to frontier VLMs (GPT-5, Sonnet 4.5, Gemini 2.5 Pro) and humans. We found that processes