OpenAI says the TanStack npm attack nearly reached signed official software
A compromised npm dependency chain came close to putting attacker-controlled code into trusted OpenAI releases
Top 90 curated tweets ranked for substance on 15 May 2026 UTC.
A compromised npm dependency chain came close to putting attacker-controlled code into trusted OpenAI releases
A feature designed to harden risky browsing exposed a large attack surface inside Microsoft’s WebAssembly interpreter
Agents can be tested on whether they update forecasts as real events unfold rather than on static benchmark snapshots
Packing tokens from many adapters into one batched pass turns small-model serving from adapter-per-forward overhead into efficient shared inference
Many RAG failures come from retrieval, ranking, freshness, observability, and data plumbing rather than from the language model itself
Fast snapshotting large dev environments changes the economics of short-lived agent sandboxes and heavyweight compile/test loops
Benchmark, Foundation, and Eclipse turned early Cerebras checks into multibillion-dollar stakes, showing how one hardware IPO can return entire funds
US and Chinese internal politics have produced a confusing situation where China may formally accept chips its AI labs already want
Industries worth roughly $1.2T of US output rely on rare earth inputs that remain vulnerable to Chinese supply disruption
A clean data-center policy separates local disruption decisions from the requirement that developers pay for generation and grid upgrades
SQL semantics on Cloudflare’s object-storage-adjacent infrastructure now support more realistic relational workloads at the edge
An $8M piece of Fluid’s depeg remediation appears to have come from an uncollateralized credit line rather than from the treasury as presented
Hallucinated citations can be flagged with public evidence without turning a preprint server into a career-damaging gatekeeper
Deleting decades of election and polling data erases a public research archive for no technical or scientific benefit
Keytruda’s success depended on a century of immuno-oncology work, years of skepticism, thousands of trial patients, and massive development spend
Photorealistic 3D Gaussian Splatting combined with a fast physics engine gives robots vision-rich training scenes at over 100 FPS
Actuators, magnets, and industrial subsystems may be better robotics entry points than building another humanoid platform
Letting an LLM learn which KV pairs to forget cuts memory use sharply while preserving performance and improving throughput
Sequence-level rewards can assign hidden token credit because gradients from positive and negative rollouts cancel in structured ways
Sparse attention can accelerate pretraining while still producing a model that works with dense attention at inference time
Most code paths benefit from GC, while the small performance-critical slice can be engineered to avoid allocation pressure
Pushing entire transactions into the database can produce very high QPS when metadata scaling is the bottleneck
GPU capacity may evolve into either bilateral long-term contracts or a liquid commodity market, with very different consequences for inference providers
A hierarchical latent diffusion language model for text is now available with weights and code for outside inspection and follow-up work
A 500-person survey and 20 interviews show how a seemingly simple privacy feature creates social meaning and confusion
Moving and comparing city outlines makes the weirdness of American municipal boundaries visible in a way static maps do not
A single attention-for-hire company operating tens of thousands of dummy accounts illustrates how much online popularity is manufactured
Pre-IPO perpetuals let an onchain venue participate in price discovery while exposing trades, candles, order books, and positioning data for verification
A chaotic resonator simulation stabilized into solitons, loops, and knots without being explicitly designed to do so
I interviewed @bubbleboi about his ratings of AI supply chain bottlenecks. We talked about DRAM, advanced packaging, CPO, HBF, PCBs, power delivery, etc. 0:00 HBM, DRAM, the cartel 7:24 Silicon photonics, CPO, Lumentum lasers 11:35 Adv
Step 4 to achieve truly serverless GPUs for AI inference: skip over unserializable inference engine setup steps like CUDA graph capture and Torch compilation by stacking GPU snapshots and CPU snapshots.
The ships keep coming. Wrapping up my first week with some inference optimizations that reduce p50 latency by 50%. The effect is bigger on long tail latency - up to 80% faster on very long docs
Introducing FutureSim: where we replay a temporal slice of the web and let agents forecast real-world events over time FutureSim replays the web day by day. Agents start on Jan 1, 2026 (past their knowledge cutoffs) with date-gated access
thrilled to introduce my and @timhwang 's new latent social archetype generalization and narrative alignment eval more cooking
pytorch profiler kind of day nsight compute kind of day
Wondering what strategies the harness adopt to achieve this. Our BenchJack achieves 100% on top benchmarks — with reward hacking! Check it out https:// github.com/benchjack/benc hjack …
"cloudbox" Fresh Cloudflare computers for agent repo work - clone a repo, run commands, verify, return one artifact - every step recorded as a receipt you can audit - runs in a Cloudflare Container, deploys to your account http:// cloud
Beau Rothrock had been at @AngelList for two months when he walked into a Redshift-to-Snowflake migration in deep trouble, already two months behind schedule. He had a 5-week window to migrate all 14,000 dashboards and reports AngelList
Open-ended coding training data may no longer be the bottleneck: AI can scale open-ended tasks—and even outperform human-expert curation. FrontierCS team is releasing FrontierSmith: a system for synthesizing open-ended coding problems at s
"If TPU v9 upgrades the topology, optical module speed, and port ratio at the same time, a roughly 4x increase in ICI bandwidth versus TPU v8 may not be entirely out of reach. This is likely not just a matter of "buying more optical modules
Check our new work on explaining critic-free RL of LLM TLDR: Advantage sign is a poor predictor of token updates. Our Cancellation Hypothesis: grads from +/- rollouts cancel out, inducing hidden token credit assignment. Huge thanks to my g
im splitting my day between writing cute dsl kernels and learning sglang architecture. this has been the most fun split ive had in a while.
This is like a good stress test for optimizers. Kaon is basically Muon/lmo + spectral noise. It preserves the singular vectors of the gradient and randomizes only the positive singular weights. For exchangeable noise, the conditional expec
Anthropic CFO Krishna Rao on Mythos: "We had an open-source code base that a prior model found 22 security vulnerabilities in, and Mythos then found 250. That is kind of scary, but that informed the way in which we released it."
looking at data is underappreciated! In the main nanogpt speedrun track we filter to at most the first 2048 tokens per document, to prevent a single gradient update from getting dominated by a single OOD document. Each step gets at least 64
The latest OpenClaw release is ~3.5x faster We run end-to-end RTT tests against every published npm release, every 6 hours, over real message channels (here: Telegram, using the brand new bot-to-bot communication). No more silent regressi
Why do you need to talk about CUDA streams and CUDA events for a blog post on Continuous Batching? I have recently started reading more about LLM inference optimization. Upon asking around, I was quickly greeted by the term "Continuos Batc
I'm hiring for 2 roles in Asta/AI for Science @allen_ai Research Engineer: RL/post-training for hypothesis generation, long-horizon agents, continual learning PhD Research Intern, Fall 26: Designing new rewards beyond surprise & novelt
Codex team: You already have the beautiful right sidebar with git, files, and the browser Why add this floating card outside of the functional right sidebar? This could be a tab just like browser/files are It's just going to make growing
made something like this! it even supports websocket and HMR so you can run a full dev server. starts much faster than Cloudflare tunnels and it lets you use a stable domain without logging in
I just realised you can use capnweb to build a lightning fast poor-man's version of cloudflare tunnels You just need to write a tiny durable object class that hosts a capnweb session, and then write a tiny client side utility that connect
ByteDance Seed just released Cola DLM on Hugging Face A hierarchical latent diffusion model for text that separates global semantics from token generation. It beats AR and LLaDA on 8 benchmarks with 2B params.
1/ Millions of parameters dedicated to LLM safety fine tuning, and it turns out the entire guardrail can be dismantled by flipping exactly one neuron.
Excited to see MiniMax in action inside open-multi-agent! It automatically breaks goals into DAG tasks & runs them in parallel
I get a lot of questions about why Kubernetes isn’t the right foundation for sandbox infrastructure. It was built for stateless micro-services with predictable traffic patterns, and databases that run forever. Sandboxes are different, the
Amp's `smart` mode uses Opus 4.7 from multiple redundant providers (Anthropic-on-GCP Vertex, Anthropic direct, etc.), so it's still working despite Anthropic downtime right now.
Can confirm. A lot of our model performance improvements came from bug fixing and data cleaning. Before you have a trustworthy training infra, you should not fully trust your ablations on model arch or hyperparameters.
1/ Our new #ICML paper targets two practical limitations of uncertainty-based early stopping for reasoning models: - Setting threshold values for uncertainty signals is hard. - "Stop when confident" principle fails to halt unsolvable proble
i'm working on some sdk design for v2 what i keep coming back to is not fully understanding one segment of the usecase every infra company is prepping for "millions of agents to run on us" are you working on a product that needs this? te
I applied my new /rust-unsafe-code-exorcist agent skill to the Bun Rust code:
Excited to share SANA-WM: a 2.6B open-source world model for minute-scale 720p video generation. Given one image + text + a 6-DoF camera trajectory, it synthesizes action-controllable 60s worlds on a single GPU. Project: https:// nvlabs.
ported the /goal command from codex to a standalone mcp and slash command for arbitrary agents and harness (for now support for claude code and opencode)
@ osdk/react is GA. Hooks for every Ontology primitive, normalized caching, optimistic updates with auto-rollback, and action-driven invalidation so your lists stay in sync without you babysitting them. Check out our GitHub: https:// pa
Just wired our fifth pre-seed investment of 2026: – $750K lead check – Three founders out of Michigan Medicine + Johns Hopkins – AI operating layer for hospital transformation – Live pilots & conversions in some of the largest health syst
IMHO most ppl see CL exclusively as a context length problem and do not weigh the necessity of updating and reasoning through often conflicting priors to actually utilise what is at hand. Amazing work!
Next wed at the Fonzi office in Williamsburg I'm hosting talks from @harmonic_ai , @Instacart , @hebbia , and noemica Topics: - yesh (harmonic): feedback loops for agents in hard-to-eval domains - ahsaas (instacart): eval pipelines that
goated article as always! HL came a long way from being just a crypto DEX if anyone wants to verify Shaunda's findings themselves, we made $CBRS (and all other Hyperliquid historical data tbh) free, forever
We’re on the hunt to find the best small AI model. Here's the latest AutomationBench scorecard (Based on 2-step workflows with explicit instructions) Most automation in production today doesn't run on the biggest model available. It run
if the idle compute ~14k H200 hours that's 4.98*10^22 FLOPs This is ~1% of the FLOPs used for Llama 2 70B (8.3*10^23 FLOPs) lot of idle compute
i had a web server running that allowed me to use codex easily from phone browser. glad to scrap it off today.
Survived my first CNBC hit at 3am HK time. Lineup: @cerebras CFO @BobKomin on IPO day, @OctahedronCap , @SambaNovaAI CEO… and me, talking the first compute futures market with @CMEGroup Thanks @dee_bosa for having me. https://
The OODA loop is a useful model of how to make decisions and execute them. Right now, depending on harness and part of the stack, the model’s horizon on this OODA loop is a single turn or a single trajectory. There are two ways to think ab
Huge SlateDB release! Personal favorite - Splitting an existing database by key range!
> uv add beads > codex, make plans and write to beads > don’t stop until no other beads are left
New blackboard lecture w @ericjang11 He walks through how to build AlphaGo from scratch, but with modern AI tools. Sometimes you understand the future better by stepping backward. AlphaGo is still the cleanest worked example of the prim
A fun experiment comparing a random step with one gradient step: With a small CNN on CIFAR-10, a random step is basically a disaster. (A gradient step is a ~185σ event.) That makes sense if you expect a random direction in R^d to be ~sqrt
Can’t stop thinking about the forthcoming “dreaming” feature from A\. Lots to be mined from traces. The Claude Code /insights skill gives a good preview. What it produces is actionable. So, if you can, in an increasingly automated way, de
whos building claude code for checking notifications in slack, gmail, linear, github
Self-Distilled Agentic Reinforcement Learning (SDAR) SDAR stabilizes multi-turn LLM agent training by gating self-distillation signals within GRPO, yielding +9.4% gains on ALFWorld and significant improvements on WebShop and Search-QA acro
1/ Introducing http:// purrtrace.com We're building the only explorer purpose-built for HyperEVM. Trace CoreWriter calls, view unified EVM+Core state, and bridging flows- all in one explorer.
fyi if you've tried to use verifiers with tinker in the last couple months and couldn't get it working due to strange errors - just opened a PR that fixes the issue
Sharing some our learnings from seeing how Claude Code works in LARGE codebases :) Read more on the @claudeai blog: https:// claude.com/blog/how-claud e-code-works-in-large-codebases-best-practices-and-where-to-start …!
No one's talking about how sandbox forking is going to change how multi-agent handoffs work. Right now, when one agent hands work to another, you destroy the VM and start fresh. The context, file state, environment, etc are gone. With for
Stripe Projects turns a network of dev tools into infrastructure that agents can use immediately. Provision databases, hosting, auth, analytics, AI, email, observability, and more from the terminal. We're shipping daily - this week: agent
Day-0 vLLM support for Intern-S2-Preview! Congrats to the @intern_lm team — an open-source scientific multimodal foundation model, with a first take on material crystal structure generation alongside general capabilities. http:// rec
QoL improvement rolling out to Perplexity Computer: When your personal financial data is used in tasks, you can hide it from view in the traceability side panel
This is wild -- how bad do things need to be for a team to use a permissioned function to take a uncollateralized loan against their own user deposits and present it as the DAO fully covered user losses?
here's a quick rundown of how the X algo works for fried attention spans: 1: the algo builds a profile of every viewer - their last 127 engagements, who they follow, what they've muted, so your post has to match the taste pattern of the pe
SWE in heavy agent use teams is basically already bottlenecked on 2 right now
Today's autoregressive models generate one token at a time. Mercury 2 generates tokens in parallel. Over 1,000 tok/sec on standard GPUs, at comparable quality to speed-optimized models. Since launch, the community has been showing what d