OpenAI’s unit-distance result improved a bound, not solved the problem
The result produced a construction with more unit-distance pairs than mathematicians expected, while leaving the optimal count open
Balanced major AI/math developments with security, hardware, energy, medicine, markets, policy, programming tools, design archives, and weird durable web artifacts.
The result produced a construction with more unit-distance pairs than mathematicians expected, while leaving the optimal count open
A GPUI desktop app claims to open a 30M-row Parquet file in under 100ms while rendering most XLSX formatting
The benchmark tests autonomous exploitation on complex real targets, moving AI cyber-risk discussion from hypotheticals to measured attack capability
April generation crossed 531 TWh for wind and solar versus 477 TWh for gas, after renewables more than doubled in five years
Hugging Face released hardware, runtime, simulation, training environments and identification tools for a repairable open robot-learning platform
Epoch estimates high-bandwidth memory rose from 52% to 63% of AI chip component spending between Q1 2024 and Q4 2025
Eli Lilly’s 2,339-patient 80-week trial positions retatrutide as a stronger weight-loss drug than semaglutide and tirzepatide
The filing recasts SpaceX as a three-business conglomerate with launch, connectivity and compute infrastructure economics pulling in different directions
Pangram Labs says two 2026 regional winners and the 2025 overall winner show heavy AI generation, forcing literary awards to confront provenance
All About Berlin says Google now answers queries using its work instead of sending readers to the site, turning search summarization into a traffic shock
Automated commits abusing GitHub Actions appeared across repositories, highlighting how CI credentials and workflow permissions remain a supply-chain attack surface
OIDC-based publishing plus an approval gate can stop compromised automation from instantly shipping malicious packages to users
Variational, Ostium, Lighter and Ondo are moving toward routing flow to market makers instead of matching users through their own order books
A reported $2B quantum-computing package extends the CHIPS Act pattern of grants paired with public ownership in strategic technology companies
State unemployment insurance data cannot reliably tell whether a layoff was AI-related, so a dashboard built on it could steer policy toward the wrong workers
Built on ElectricSQL’s Durable Streams Protocol, Ursula combines quorum replication, sub-50ms p99 latency and object-storage economics
New DCLM results suggest large models can tolerate and sometimes benefit from nominally low-quality data, challenging a core pretraining assumption
The model attempts to represent smell by predicting activation across roughly 400 scent receptors, giving olfaction a machine-readable coordinate system
Intel is trading bandwidth for capacity and cost by using 160GB of low-power DDR5 memory, which is far cheaper than HBM but changes the packaging problem
Agents can now search and reason over regulatory approvals, clinical studies and endpoint failures without relying on ad hoc web search
The release claims competitive web, GUI and local-file automation from much smaller models, suggesting data and environments are closing part of the scale gap
A pure-JAX rendering engine makes first-person visual environments cheaper to run, which matters because rendering cost bottlenecks RL experiments
The library targets low-microsecond execution for common robotics computations, making differentiable optimization loops easier to run at scale
Evolution can favor strategies that maximize geometric mean fitness rather than arithmetic mean fitness, mirroring robust bankroll growth under uncertainty
The TypeScript database stack is shifting as Drizzle grows past Prisma, Kysely rises behind it and MikroORM crosses 500k downloads
A sanitizer bypass in a widely used XSS defense library is the kind of quiet web-platform vulnerability that can matter more than louder browser bugs
Founders who show one workflow shrinking from hours to minutes are getting more serious investor attention than teams listing many generic agent features
A scanned archive of Japanese symbolmarks preserves a dense design reference set from a high point of corporate identity systems
The site lets naturalists log butterfly sightings while learning taxonomy, turning a niche field guide into participatory data collection
A compact electronics build demonstrates a classic oscillator circuit with an approachable parts list and visible end result
Reinforcement learning research with Joseph Suarez
RLDX-1 took the first place "officially" in the RoboCasa365 benchmark! https:// robocasa.ai/leaderboard.ht ml … Tech report: https:// arxiv.org/abs/2605.03269
We're adding a codex harness added to our ARC-AGI-3 agents repo We're doing ablations to find what tool helps the most * Codex - No tools * Codex - Scratchpad only (read/write) * Codex - Code only (read/write/execute .py) * Codex - Scratc
CursorBench vs Artificial Analysis Coding index cursor harness seems to almost always improve score for opus, but makes the cost/task higher opposite for gpt5.5 (codex scores are higher but cost/task with cursor is lower)
When we first partnered with @bernhardsson and @akshat_b , they had a vision to rebuild cloud infrastructure from the ground up for AI. Inference, sandboxes, training, RL, batch compute - the platform they set out to build is here. Comp
https:// arxiv.org/abs/2605.21486 Importance of embedding learning rate for hyperparameter transfer and training stability. It aligns with previous work ( https:// arxiv.org/abs/2407.05872), and maybe with older work ( https:// arxiv.org/a
Because we are SOTA on browser agents, people forget how good our cloud browser infra is. Today we just overtook #1 on the leaderboard
Frontier AI labs collectively have access to less than 50% of all AI compute in the world (!)
Nathan's @cursor_ai team didn't prompt-engineer their way to Composer 2.5. They trained it. The massive RL program runs RL rollouts on Fireworks, alongside production inference. "Comment to see my prompt" → may work for influencers (in
update on RECAP (and RLT) implementations. the main challenge has been collecting quality data and training a base policy that is good enough to apply post training methods to. Molmoact2 is showing very strong zeroshot performance on so101
got a 10% (relative) increase in eval scores by simply changing the sampling args to the recommended ones what are we doing man
the new appshots feature in codex is way more useful than i expected, this is great it captures a screenshot + the window context, including text, urls, file paths, and even text scrolled offscreen (the offscreen part is so nice) my one i
Working hard to be able to buy one of these at some point in the future 7x RTX PRO 6000. Custom liquid cooled system, in a small under desk box.
Command A+ is available on @huggingface with W4A4 quantization Cut your serving footprint dramatically with virtually zero performance degradation. Try it now: https:// huggingface.co/CohereLabs/com mand-a-plus-05-2026-w4a4 …
New in-depth blog post: "Dissecting ThunderKittens: Anatomy of a Compact DSL for High-Performance AI Kernels" This post is my attempt to dissect ThunderKittens from the bottom up. I approached TK by asking what each abstraction is really b
A meta-agent in your traces is editing prompts, tools, validators today. The survey "Code as Agent Harness" (May 2026) named the loop. https:// arxiv.org/pdf/2605.18747 It didn't name the safety contract. https:// medium.com/@epappas/th
Reinforcement learning on visual first-person environments is costly: rendering engines are expensive! Enter JAXenstein: a lightning fast benchmark of first-person environments based on a pure JAX reimplementation of the Wolfenstein 3D ren
'MCParasite is an open-source security research tool that tests LLM-powered agents for susceptibility to context worm attacks, self-propagating prompt injections that spread autonomously across AI agents.' https:// github.com/MCParasite/m
https:// arxiv.org/abs/2605.20798 Validation of transformer modifications similar to https:// arxiv.org/abs/2102.11972 with modern modifications. It is nice to find out bonferroni correction here. Maybe the setup that has been used here c
pro tip: instrument your agents traces gen_ai[.]conversation[.]id from otel semantics and tomorrow you can query those conversations from the Sentry MCP server (this also unlocks the in-development Conversations view in Sentry)
A holistic, physics based AI-native development and manufacturing stack with dependency/constraint context means an idea/need can turn into real world, validated hardware in hours.
I joined @RatioDrones as one of the founding engineers we are building the MGC 1 for the DARPA LIFT Challenge in august its a turboshaft powered tandem rotor UAV with an 8:1 payload target and it can lift quite insane amounts of payloa
Based on disclosures about data center power capacity and compute spend, the compute used by OpenAI, Anthropic, and xAI is likely <30% of the world total. Google and Meta are giant hyperscalers, but much of their compute goes to cloud and
Activation steering can reliably push a text-to-image generator toward a visual concept, but at a cost: each concept needs its own estimation. HyperTransport (HT) predicts the intervention directly, matching per-concept SOTA at 3–4 orders
CMA just got two god-tier updates 1. Now you can dynamically update the list of tools or MCP servers mid-session In other words, you can hand "tools" to the agent even partway through a session This is seriously something I wanted so bad
An AI agent that understands and explains piping and instrumentation diagrams (P&ID) for plants and chemical factories using graph analysis tool calls
Currently, in INTC's FABs, only two support Feveros. Capacity share < 10%. Samsung is similarly around < 10%. Most are single die FABs. If FABs support 2.5D/3D packaging: It will also require ordering a large number of new equipment:
had opencode build a way to declaratively apply a schema to a sqlite db in any state - first pass iterated table by table and tried to mutate db to get it to match - i told it to try an ast, diff, apply method - it did that but the public
This I think is a bigger result than most people realize right now: https:// arxiv.org/pdf/2605.08605 - They embed a strong prior into very small models of ~1M parameters, a perfect state keeper for Sudoku/Maze states yields almost perfect
DeepWeb-Bench: A Deep Research Benchmark Demanding Massive Cross-Source Evidence and Long-Horizon Derivation Presents a deep research benchmark of 100 tasks requiring massive evidence collection, reconciliation, and derivation https:// a
We regret to inform you that the source code for Ente has been leaked. Most likely, it's being auctioned on the dark web. Whoever purchases it will regret it for the reasons in this thread.
OpenAI Q1 operating margin was -122% even when excluding SBC and other items.
We just shipped NVIDIA-Verified Agent Skills Skills make your agent more capable, but can also introduce vulnerabilities. Verified skills give you transparency into what a skill does, where it came from, what risks it carries, and whether
[preprint] Diffusion models MCMC ! Diffusion model samplers are biased due to discretisation The fix: Metropolis-type adjustment on corrector steps Challenge: no access to the density ratio, only the score Insight: the score (and some m
Happy to give a talk at TikTok hosted by @gy910210 ! Topic: Recent advances in DeepResearch agents — data, training, and infra Slides here https:// github.com/Zhuofeng-Li/Zh uofeng-Li.github.io/blob/main/assets/pdf/zhuofeng_tiktok_d
It was an honor to give the keynote at MLSys Covered how AI systems have evolved, why AI is needed to improve them, why results have disappointed, why the future looks amazing, and why I’m working on this at Core Auto Recording should be ou
Modal turns an entire infra team’s worth of GPU orchestration, autoscaling, & sandboxing into a devex that still feels like writing local Python. Excited for @modal to make it to $4.65B. And grateful to the Modal team for letting @Redpoi
4.5T checkpoint generalised better than 4.9T checkpoint. this screws the pre-existing intuition that generalisation increases monotonically with timesteps and data. no one understands pretraining.
LLM data prep is where a lot of model work quietly gets stuck. DataFlow is an open-source data preparation and training system for generating, refining, evaluating, and filtering AI/LLM data from noisy sources like PDFs, plain text, and lo
I'd like to issue a clarification on the section covered around 1:33:44 of the podcast, where @dwarkesh_sp and I discuss multi-step vs. single-step LLM RL. I muddle through some hand-wavy quadratic variance statements around multi-step RL
The order book is the most important financial primitive and we are proud to have implemented it fully verifiably on chain. We are experimenting with an RFQ feature for whales that uses the Lighter order book directly, take a look!
NSA is releasing security design considerations for AI-driven automation leveraging MCP which, while simplifying the integration of diverse capabilities into powerful agent workflows, requires caution. Learn more: https:// nsa.gov/Portals/
Starlink dominates at 61% of revenue ($11.4B). Space launches contribute 22% ($4.1B). AI adds 17% ($3.2B) through X platform & xAI compute infrastructure merged in Feb 2026.
Kawasaki is partnering with NVIDIA to develop robots, and will use Dr Jim Fan's reinforcement learning virtual-world techniques for training. Kawasaki Heavy Industries will establish a robotics base on the US West coast.
Your Telegram inbox can now run itself. Assign a bot to read and reply for you — with granular control over its permissions and chat access. API docs: https:// core.telegram.org/bots/features# secretary-bots …
the diffing library i'm calling "frontier" is still improving... i decided to turn json into an "adapter" and make it a universal diffing engine. it's currently beating the top libraries on npm for universal and dom formats too. next - b
@huggingface Bio released Carbon, an open DNA foundation model family. We tested a simple infra question: "Can Carbon run on @awscloud Trainium2 with NxD Inference on day one?" The answer is: Hell yes !!! Carbon-500M, 3B, and 8B all
Excited to release Delta Attention Residuals! A simple & powerful idea: route over layer deltas instead of cumulative hidden states to avoid routing collapse in deep transformers. Sharper cross-layer routing, lower perplexity, efficient fi
we gave Claude Code access to the Mosaic API. it found Mark Rober's latest video, clipped it, captioned it, and added motion graphics, autonomously. no timeline. no manual editing. just mosaic docs and one sentence. video editing is now
New Release: Erase Remove objects, erase text, and clean up details in any image. FLUX is now trained to erase and reconstruct as one task at the model level. We noticed this approach leads to images that stay clean and consistent. Give
This is text-to-CAD right now. A planetary gear assembly in CAD Explorer where users adjust drive parameters and animation speed, showing gears rotating and meshing realistically. Simulating forces on complex mechanisms strains AI spatial
A great example of our field shifting from Benchmaxxing to _Benchmaking_. Only novel results and artifacts count.
Anisotropy is good for fwd and isotropy is good for bwd.
When I want true information. It's difficult to consistently elicit truthseeking from Claude. Here is a mundane example, on 4.7 max, default sysprompt. https:// termbin.com/zk9w After a few turns, it simply gives up and says "I don't kno
With Speculators, you can train your own DFlash speculator for Gemma 4. DFlash speedups are domain-dependent. A generic public checkpoint may not be the best fit for your workload. But if you train the speculator on your own tasks, it ca
At Anthropic, we feed Claude Claude-generated misaligned CoTs to test if Claude is willing to rat out Claude. Claude agrees this is likely an effective stress test of Claude's security posture toward Claude.
fastCRW just crossed 100 on GitHub Open-source lightest, fastest, Rust engine for web scrape, crawl, search & map. Thank you to everyone who starred it — we're just getting started. http:// github.com/us/crw #opensource #webscr
Memory went from 9.3% of BOM to 25.6% of BOM
My honest feedback on antigravity-cli: 0. Requires re-auth every launch despite being logged into the IDE. 1. Unreadable UI rendering during execution. 2. Zero UI hints for costs, skills, MCP, context, or current paths. 3. No auto-model all
Most “AI image apps” are just screenshots traveling to a server. I wanted: • weights on device • generation on device • images never leaving the phone So I spent the last week turning sd.cpp into an actual iOS developer experience. Sw