AI may take 60% of TSMC N3 wafers in 2026 and 86% in 2027
Leading-edge accelerator supply is becoming a negotiation among TSMC, Apple, and Samsung rather than a normal capacity investment cycle
Balanced toward durable technical and market substance while avoiding multiple near-duplicates from the Opus/GPT benchmark chatter.
Leading-edge accelerator supply is becoming a negotiation among TSMC, Apple, and Samsung rather than a normal capacity investment cycle
A simple Linear lookup can load 42 JSON schemas into the prompt, turning agent tooling into a KV-cache tax
A compliance flag on one external wallet swept a whole public contract into a holding freeze, exposing collateral-damage risk in tokenized finance
Mega heat pumps moved from roughly 165°C and a few megawatts in 2018 to 280°C and up to 100 MW by 2024
Inference clusters are now costly enough that deployment search needs simulation before teams can safely verify configurations on real hardware
A new Linux privilege-escalation bug affects multiple distributions and can grant root access
Investigators seized 200 servers tied to a residential proxy botnet built from infected devices
Parasolid underlies SolidWorks, NX, Onshape, Shapr3D, Plasticity, and many CAD startups, making one geometry engine a hidden industry dependency
A Journal of Economic Literature guide consolidates practical material for using modern difference-in-differences designs correctly
A production data pipeline at Vercel is handling multi-fanout delivery and deduplication at several gigabytes per second
A single transformer block repeated K times can match or beat 3D reconstruction models with 8–10× more parameters while using less compute
A sample-complexity theory argues that hidden hierarchical data makes token prediction harder with depth while latent prediction avoids that blowup
Robots can track when observations, actions, and tasks drift from training distribution before failures become unrecoverable
Git’s source includes a tiny compiler trick for detecting non-constant values, showing how mature C projects encode portability hacks
Modern CPUs are often bottlenecked by memory access, so layout and locality can matter more than instruction-level cleverness
Launching Steam with Chromium debugging enabled allows JavaScript injection through the webSocketDebuggingUrl interface
Three second-year associates billing at $1,410 per hour accounted for $725k of April fees in the Spirit Airlines case
A Toronto AI infrastructure startup reportedly found Bay Area customers before Canadian enterprises would even trial the product
A lockable physical barrier between cabin and cockpit adds protection during the moments when the cockpit door must open
Private archives from a NeXT colleague point to a lesser-known chapter of Jobs doing business with intelligence agencies
Newly surfaced Hong Kong photographs capture low-flying Kai Tak approaches and Kowloon Walled City before their disappearance
Across 83 synthetic-pretraining experiments, most diversity metrics failed to predict data quality while G-Vendi stood out
Reasoning-model chains of thought appear to share recurring high-level operations across models rather than only token-level patterns
A multi-agent pipeline collected the largest open dataset of frontier research-level math problems for evaluating mathematical reasoning
For some patients, access to an early clinical trial abroad can be a life-changing medical option rather than an abstract geopolitics issue
Manual projection boundaries, object outlines, and colliding 3D layers make projection mapping approachable with a custom Three.js workflow
Publishing safety evals and datasets lets outside researchers reproduce, scrutinize, and extend government AI-safety work
Major-provider LLM pricing has dropped sharply enough to change workload economics even as total usage continues to rise
Had such a blast working with @erictang000 , @charlie_ruan , @sumanthhegde , and @pcmoritz on enabling multi-LoRA RL training in SkyRL! We observed ~3x higher experiments throughput in comparison to running experiments in the traditio
I am truly blown away by Qwen-3.5-27B. It's doing better than Haiku 4.5 on my OOD interp task that involves 50k context in an agentic setting. Such a great cheap model for research tasks.
We don’t kill sandboxes if connections are alive, and users are programming against that guarantee now. May be others do it too?
Full build Asrock romed82t motherboard Epyc 7443p CPU 512gb ddr4 samsung Noctua fans 2 PSUs a 1600w and 1000w RTX 3090 for TTS and STT 4x RTX Pro 6000s Makeshift chassis 6tb total NVMe Each RTX pro is capped at 275w 2 GPUs per PSU
Besides token faithfulness (TITO), there are a few more challenges I noted in long form agent RL, tldr: - Rollout takes 80%+ overall time. Long tail (eg. looping errors) rollout are ubiquitous, and so efficient async RL is a must. - Correct
Microsandbox just leveled up (v0.5.x) Native SSH & SFTP, no sshd needed New file commands (copy, mkdir, rm) Hardened mounts + env-backed secrets Configurable OCI upper sizes Faster networking. Tighter security. Same blazing-fast san
There's a better way to serve your inference stack, you just haven't found it yet. DynoSim is a workload-driven simulation of the Dynamo serving stack that turns exhaustive deployment search into a simulate-then-verify loop. Instead of te
I made a Harness that can automatically generate games, from the same lineage as the previous Auto Quant/Auto PPT, with a built-in hand-rolled RPG Maker-style engine AI can play and modify itself in headless mode, and even secretly add plo
Inference Optimizations Behind the MiMo-V2.5 Series API Price Reductions Read the full technical blog: https:// mimo.xiaomi.com/blog/mimo-v2-5 -inference … The V2.5 model family, including MiMo-V2.5 and MiMo-V2.5-Pro, is built on a Hyb
Over the last year, I have become increasingly convinced that memory is fundamentally a retrieval problem, not a storage problem. Most memory systems can store information just fine. The hard part is deciding what to retrieve, when to retr
TLDR: Opus 4.8 high ranks only 5th overall, but it is the cleanest and most reliable result so far: #1 validity, 93.3% clean stops, and 0% wrong-edit-distance failures. It edges out GPT-5.5 on reasoning efficiency, and cost-efficiency, but
Opus 4.8 is now on DeepSWE. On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lowering average cost per task.
An underrated use case of this is being able to have codex iterate on new disagg + offloading ideas before we go implement them in the engines themselves. Saves us a lot of time and lets us move really really fast.
A few architectural choices we made for Tensorlake sandboxes are helping us move quickly now - 1. Runtime environment driver architecture on the dataplane. This lets us run CloudHypervisor, Firecracker, and gVisor transparently for differe
Here's a cool one we did for a manufacturer who made expanded metal sheets. They needed a transfer mechanism. We made a custom end of arm tool, and built the conveyor in house. Fanuc M900 robot.
I want the following in Codex, Cursor, and OpenCode... 1. Pinned Messages: Let me pin assistant messages to the sidebar for things I want to keep track of but am not ready to address yet. Render as a checklist & jump navigation. 2. Notes:
Generate the agent's "memory" through generation, not retrieval (https://arxiv[.]org/html/2605.21463v1). Conventional agent memory has been dominated by RAG. A mechanism that accumulates past experiences in a bank, retrieves similar cases,
Selective Pressure on the -O2 Gene Pool,linear genetic programming for compiler optimizations in LLVM
I keep seeing new local AI models come out, everyone focuses on how the largest version of it performs. No one mentions how the smaller quantized versions perform. So I built a system to test them all and am posting them online as they comp
it's very well possible that they finally invested hard in some form of context distillation (anthropic circa instructgpt era flavored or opd flavored, take your pick, latter is the hot shit right now sooo?)
Given that Claude seems so lazy in chat (especially with technical search topics), it seems pretty telling about how a harness can make a model far more independent and thorough. GPT 5.5, and many of OpenAI's recent models, seem incredibly
now in @opencode beta: run 𝚘𝚙𝚎𝚗𝚌𝚘𝚍𝚎 𝚜𝚎𝚛𝚟𝚎 --𝚍𝚒𝚜𝚌𝚘𝚟𝚎𝚛𝚊𝚋𝚕𝚎 once every 𝚘𝚙𝚎𝚗𝚌𝚘𝚍𝚎 TUI instance after that auto discovers and connects to the already running server run /status to see what server you are conn
The models all seem stagnant and the biggest gain for me has been the harness/product - codex or antigrav
In our paper, we also find another interesting angle to see how much deep attention layers hate to compute from what is in their residual stream: If you learn coefficients for standard value vectors in final attention layers, they will be
I was autoresearch emjoying, left codex to research quantization schemas and it achieved unexpected results. Setup: Wrote kernel generator. GPT5.5 searches the right sueprblock, block sizes, number of outliers, scale&outlier precisions, cod
I think there's a cleaner version of this question: What's the best GSM8k solver that fits in <2MBs that doesn't include (leaks of) test data.
My EU Codex just did the “impossible”. Computer Use plugin Chrome plugin No VPN needed Locked & unlocked modes I wonder what else is "impossible"...? My mind = Want the setup? #OpenAI #Codex #AI
Cloudflare is working on Web Search, giving AI agents and Workers real-time access to the public web.
Thanks Lucas! Yes we prefetch the full V in deep layers. The early layers need standard V computation, so when they are doing computation we already know the complete list of token indices to prefetch from.
Some of you asked about how to run DeepSeek inside Codex. The core steps are here
built it. validator → relayer over QUIC. every packet signed with the validator's Ed25519 identity key. client verifies. 1000 packets. 100% signature verification. no middleman touched the data. this is what origin-to-customer looks like in
Tech Blog Release Day 3.5. (Since it's a light article, I'm doing two per day) I wrote about a problem that occasionally arises when working on supercomputers and such using VSCode's Remote SSH feature, detailing the specific symptoms and
An excerpt from my upcoming paper: Renormalization Group Theory of Learning Among other things, the RG approach helps explain what the Muon optimizer is doing. Here, I show a simple experiment (MNIST/MLP3) where AdamW overfits almost imm
Getting close to implementing tmux/zmx like persistent terminals. But what if the pty backend itself needs an update? *Then* you gotta restart those terminals, but Kolu still can 'restore' their state to the extent possible.
The best symbolic solver for GSM8k (i.e., a pure python program) achieved a ~15% test error on GSM8k, which is kind incredible considering Llama 13B achieved the same :) Both Codex and CC seemed frustrated when pushed to grind beyond 15% a
ResearchMath-14K: 14K open research-level math problems Curated by agents from academic sources, with 220K reasoning traces. Fine-tuning filtered attempts improves Qwen3 by 9.2 points. Newer models also make 5x more fake references.
fascinating. 5.5 is the only model here that doesn't seem to improve monotonically on my problem with test time compute scaling
as my system prompts grow, im using more analogies/metaphors/culture references now. this is 1) more token efficient 2) the models can reason abt applying the vibes/spirit generalisably e.g. instead of specifying how/what to explain before
Claude Opus 4.8 scores 58% on DeepSWE Bench, Pass@1 #2 overall behind GPT-5.5. Keep in mind GPT 5.6 comes out in June along with Mythos later in the month!
In 2026, Chinese CSPs are tendering for 4.5GW-5GW of IT capacity American CSPs will bring about 20GW-25GW of compute/IT capacity online this year
Wanna make a game on @Cloudflare ? shipped a live multiplayer driving playground to the starter kit: – drop a `.glb`, get a public edge-cached URL – drive a Ferrari, shoot, hit ramps/cones/barriers – mobile joystick controls – R2 + Worker
Increasingly, HTML Artifacts are becoming a core part of how I work with AI agents. Long-horizon agent sessions need a better way to surface insights about what work it has done. This may not be obvious right now, but as you start to let
I had 14 tabs open just to keep up with AI. arXiv, Papers With Code, every leaderboard, HuggingFace, half a dozen RL-env hubs... So I built one screen for all of it. The Bloomberg terminal for AI research. It's called Sophon
what are people using to manage tpu vms at scale (no gke)? I'm very annoyed with manually rolling docker + ray
I'm not smart enough to know any better, but here's my take on the model industry, especially wrt to using it in coding agents. 1. Models are the entirety of the capabilities. Context engineering is not really a thing. You cannot work arou
for GLM-5 all of these harder benchmarks are included in its rating and it is on par with o3/o4-mini which is almost a 10 month lag
50B tokens = 50MWh = 50 house-months energy usage My back of envelope. Without judgement.
Recently cold-emailed @MatiksHQ for an internship as a high schooler. Built a bot that bypassed part of their backend security and could be used to cheat on the platform. The CTO agreed to a call. Called twice. Texted. Followed up. Not
Everyone focused on JiT's move to pixel space. Today, JLT asks a different question: Can the benefits of clean prediction survive entirely within latent space? FID: 6.56 → 2.56 JLT learns to predict latent x directly, rather than velocit
Anthropic cuts its list of unauthorized secondary market sellers from eight to four after the initial notice caused panic and pushback from investors ( @yazhous / Bloomberg) (Visit Techmeme dot com for the link and full context!)
in the next version of @opencode acp will be faster and more compatible with the protocol. fixed a bunch of zed/windows/mac issues, permission weirdness, dropped image blocks, slow config switches, etc this was a full rewrite of the acp
Demand continues to accelerate with record results across all areas of our business: Revenue: $43.8B 88% YoY ISG Revenue: $29.0B 181% YoY AI Servers Revenue: $16.1B 757% YoY Traditional Servers & Networking: $8.5B 92% YoY Storage Reven
Vulnerabilities not being exploitable in a common configuration also happens often when I audit network code during research. The effort to report them, or describe them in a paper, was often not worth it. So these ‘vulnerabilities’ typical
Is GRPO even worth it for long-horizon. So little is learned per rollout token and the reward variance is so high
the funniest thing is that if you remove the system prompt opus 4.8 doesn't believe that the "workflow" keyword is real
lol i think i whipped up the simplest, dumbest agent env that a. (broadly) sorts the wheat from the chaff b. also shows you if a model has strong overthinking tendencies by default as is usual, v4 flash is busy paretomogging w.r.t the cost/
In modded-nanogpt we also found that the last couple attention layers hate interacting with the final prediction MLPs. So we work around it with a cached activation from earlier. In the attention residuals paper, Kimi doesn't explicitly men
In our new paper, we naturally derive a new attention variant based on the surprising finding that deep layers benefit the most from learning a context-free value vectors, without the input from the residual stream. The attention variant:
guys, i found a notepad.exe heap overflow. it's silly, but i can't believe it's 2026 and this shit still happens.
The broader implication, which we work through in detail in the piece, is that the supply curve for frontier accelerators is now effectively a policy decision inside two or three companies (TSMC, Apple, Samsung), not a capacity-investment q
Every time I see someone publish benchmarks that look impressive on the surface, but the sample size is buried in the footnotes with n < 100. Oh.
Scaling AI is not just about adding GPUs. AMD Pensando Pollara 400 AI NICs give Crusoe's AMD Instinct MI355X GPU instances the high-bandwidth networking needed to support demanding multi-node AI workloads. Read more