Goldman and JPMorgan explore trading compute futures (x.com)
Compute is moving from an operational input to a financial asset that banks, exchanges, and lenders may hedge directly
Selected one representative item for repeated stories such as FrontierCode, WWDC, SpaceX/OpenAI liquidity, and agent loops to keep the page broad.
Compute is moving from an operational input to a financial asset that banks, exchanges, and lenders may hedge directly
The benchmark tests whether maintainers would actually merge an agent’s code, not just whether it passes prewritten unit tests
A broad price hike in power inductors points to a tight component bottleneck below the headline GPU and memory layers
A default-config RHEL root bug in Linux CAN networking shows both classic kernel exploitability and the emerging role of automated patch discovery
Silicon photosensitivity can turn unpackaged DRAM into a crude camera by mapping light-induced leakage and bit flips
Running email from an independently registered IP block exposes the hidden operational work behind sending, receiving, reputation, and deliverability
Measuring how much coding output a token buys over time turns model productivity into an inflation-style index instead of a benchmark score
Carefully verified, visually diverse questions show that real-world visual understanding remains far from saturated for leading multimodal models
A shared metadata and commit format would reduce translation layers between two dominant lakehouse standards
Large distributed queries often bottleneck on shuffle, and rebuilding that path attacks the dominant cost once terabytes must be materialized
A Rust implementation of React Compiler can move React optimization into the fast native JavaScript tooling stack
Gaussian splatting from aerial imagery can make 3D city scenes sharper than traditional photogrammetry, especially for trees, wires, and ground detail
A modular gripper that lets an SO101 arm swap its own tools lowers the cost of experimenting with general-purpose manipulation
Early clinical data for a KRAS inhibitor plus PRMT5 combination suggests combinations may define the next phase of RAS-targeted cancer therapy
A Lean proof of KZG function binding in Ethereum’s SNARK verification library strengthens the formal foundation under cryptographic commitments
An AI-generated surveillance match led to an arrest, job loss, housing loss, and custody loss for a man who lived in another state
A child-safety mandate on smartphones raises the stakes for on-device scanning, platform obligations, and surveillance boundaries
A confidential S-1 gives OpenAI the option to go public while it weighs which decisions are easier to make as a private company
The owner of Eventbrite and Vimeo is taking a rollup-and-efficiency software model into the public markets
Constraining the document space lets search agents use bash-style tools without making open-ended document interaction unscalable
The release makes pandas String columns 9–30x faster and exposes zero-copy Arrow C output for Go, Rust, Node, and C++ bindings
A theoretically faster matrix multiplication algorithm runs into floating-point instability and recursion overhead when implemented directly
Yachts can help manage residency-based taxation by making physical presence and taxable residence more flexible
Painting over tape on an object can make subtle color differences easier to see before committing paint to the underlying surface
Figure drawing separates real artistic individuality from the accidental habits that disappear under disciplined practice
Robinhood, Fidelity, Schwab, and SoFi restrict future IPO access or charge fees when customers sell newly allocated shares too quickly
Pyongyang’s reported housing construction and cellphone assembly numbers expose both North Korea’s economic shift and California’s housing dysfunction
RL workloads are turning isolated container orchestration into a million-environment systems problem
I made a really hard coding test for models!
We gave frontier LLMs your daily interaction history — they still score below 0.5. Adding memory makes it worse. Findings from our VitaBench 2.0 — the first agent benchmark for long-term dynamic user modeling, evaluating Personalized & P
SWE-Bench style grading has been the standard for years now - you ask the agent to solve an issue and then run its code on a pre-constructed unit test. The problem is that passing a unit test is only one part of writing production-ready co
Does a token buy you more or less now than it did a few months ago? We built a consumer price index (CPI) for AI coding output from Anthropic's Opus 4.6 model in SWE-chat, Feb 5–Apr 15, 2026. What we find looks like tokenflation:
very sus empirics: -at&t coverage is not random, it is urban and rich -trt is measured using dec 2010 coverage, but they use it to explain births starting in 2008 this is a huge problem specially since the att coverage expanded between then
Ok I am not taking enough risk. *SITUATIONAL AWARENESS'S ASSETS HAVE JUMPED TO ABOUT $20B, SOURCES SAY -- WSJ *SITUATIONAL AWARENESS GAINED ABOUT 270% THIS YEAR THROUGH MAY, SOURCE SAYS -- WSJ *ANTHROPIC INVESTMENT NOW ABOUT 20% OF SITUA
First PE software M&A since SaaSpocalypse! Kneat $ksi.to bought by Thoma Bravo for $650M 8.5x TTM, ~7x fwd ARR 98% gross retention, 120% net type business Goodbye sweet prince
a year ago, ~98% of tpuf queries were vector ANN last 30d: 64% vector ANN 19% full-text BM25 13% filter-only 3% aggregate 1% other (sparse vector, exact kNN, ...)
Another example: A package -> package rack hop via serdes + switch is ~ 200 ns * 2 (2 rounds of serdes/FEC etc.) A ping + pong is the min critical path for a multi-package TP/EP layer. That's 800 ns - 1200 cycles @ 1.5 GHz. Comparable to
Nemotron 3 Ultra 505b a55b scores 43.5% on WeirdML, comparable to Mistral Medium 3.5 128b or o3 mini. It can sometimes do well on some of the hard tasks, but it's not very reliable. It also often emitted the "stop" token when done with
Latency limits throughput for low latency inference hw A PE to PE hop of 20 cycles for reduce-scatter on a on a 16 PE col is 320 cycles A kimi k2.5 7168x2048 param is 114 KB per PE for a 16x8 grid. At 256 bytes/cycle per PE that is 448 cy
Jane Street's invested in Situational Awareness, which has now seen AUM increase to over $20B. Leopold's investment in Anthropic also accounts for about 20% of their assets. "Situational Awareness has gained about 270% after fees this year
another banger from @pupposandro and the @luceboxai team Luce Spark runs Laguna XS.2 in 14.6 GiB at ~100 tok/s on an RTX 3090, versus ~119 tok/s fully resident. you can now run Laguna below the 16 GiB line and use it for local evals
I actually spent nearly a whole day implementing this thing from scratch. In the end, under the same throughput and VRAM usage, the precision (measured by PPL) still couldn't beat TurboQuant. What a complete waste of time.
Excited to launch Luce Spark: now a 35B MoE runs on a 16GB GPU, with no offload tax. An A3B model fires ~8 of its 256 experts per token, but to keep it resident you pay VRAM for all 256. Spark pins the experts your traffic actually hits, o
The convergence disadvantage RNNs have over the all-seeing Sauron eye is actually a feature. It is a cost you pay at training time to save big on inference. Learning to compress is almost always worth it.
We've always intuited that verification is easier than generation. Chen's new work shows that explicitly training for it unlocks massive self-improvement: 14× boost in test-time refinement on hard reasoning 30% gain beyond the RL plateau
Day 23: ok, so this video jumps around because it's not a true physics sim. what it does is for each pose of the animation frame, it runs a physics sim loop and finds a stable rest point. it still is helpful to me for debugging center of
We had a strong week for Lighter infrastructure on Jun 1-7! Some stats: highest number of orders processed over a 24 hour period (811M), maximum TPS of 20.7K, no latency spikes, with p99 latency at 165ms. Proving costs under 100k, genera
No one in real life cares about outer radius matching inner radius. they do however care about big gaps in their tiled windows or against the edges of the screen. Good apple
a few quality of life updates are now live in repo. collage has a nice staggered load in and there’s now a ‘charcoal’ theme you can toggle on in settings
I agree with Dwarkesh that the million-fold sample efficiency gap is real, but it's measuring the wrong loop Pretraining is grotesque because it's gigawatts and a thousand rollouts per task to learn one skill. But it produces a model that
you can in fact get salary offers 6 figures higher than the top of the range if you are cooking
Good take. We worked on this about a year ago: model routing for batched instructions. The core idea: estimate task vs. model success probability, then solve the allocation problem under a fixed cost/latency budget. Not "best model for e
maybe unpopular but standardizing instruction files across models and harnesses is a mistake. they behave wildly different, and require different instruction files.
look at the data
"Since when is life about happiness? It's about impact." What Daniel Ek ( @eldsjal ) told Dara at Allen & Company Sun Valley in 2017 to convince him to take the Uber CEO job. Daniel was also the one who recommended Dara to the headhunter.
In case you didn’t notice: Agent Arena doesn’t have a voting mechanism. So how do we calculate the scores? The answer is causal inference. Agents are multi-stage systems where the orchestrator and harness work together to produce the end r
it's very interesting that despite how close their compensation levels and nature of work are to quants, AI researchers have no non-competes and garden leaves
We cut Asimov's wiring down to a single connector. The power and communication boards are now one stack. Power and data run through one XT30(2+2). Motor power toggles in software through a solid state relay. So now we bench-test a single
Ilya (sorry for name dropping) met with me in 2024 and said in a meeting that we can do better than Shampoo family (renamed to Muon and friends). Now I can say this is very true, there exist an optimizer that shows the same scale of impro
Kelly Johnson built a jet aircraft in 143 days in 1943 without digital infrastructure or a data stack. A modern F-35 test event generates more data than Johnson's entire program, and most orgs still can't act on it fast. @DefTechSignals
There's endless discussion of "inner loop" coding agents—which harness, how to multi-task, which models, etc. But the "outer loop" of automated investigate-and-handoff (or just fix) workflows is what our largest enterprise customers are m
The Uber AI cap is good news for the foundation model companies, not bad. What happened was that somewhere around 3-5% of tech spend, CFOs in America started to notice, and not one of them said cut it to zero. Instead, they’re all going thr
I have had a weird theory for years that Uber fits their ETA model using L1 regression instead of ordinary least squares. Since wait-times have a long right tail, L1 reg will cause a bias that underestimates mean wait times.
It is not a computer. It is a Programmble Digital Processor. This is so engineering departments could buy one without having to go through hoops to buy a 'computer'.
June 2024: Apple's Marketing SVP cringes when Gruber says "Let me give you a slogan. Siri: this time, we mean it." Gianandrea (now "retired" to "spend more time with his family") follows up with "The first thing I said to the Siri team: fai
two takeaways: 1/ not long from now, we will have ACI (agent-computer interface) research area as opposed to HCI (human-computer interactions). 2/ given that different domains have wildly different types of interactions, domain-specific ha
Last week was my last at the @arcinstitute . It’s been an incredible journey Building Arc's first dedicated machine learning group from scratch was a lot of fun!
100p! in our recent intelligence per watt (ipw) paper, @JonSaadFalcon & i find that 71.3% of real world chat and reasoning queries can be shifted from frontier lms to local lms! link to ipw paper in comments below
MANGO
1/ New preprint! Reasoning models often require hundreds of task examples and thousands of rollouts to improve on a task. How can they learn more from much less? Introducing CORE: contrastive self-reflection for rapid, sample-efficient, an
My thing is that unless it’s a RAW file there’s no such thing as an untouched photo, and 35mm scans are reliant on the colour profile and science of the scanner
Tip: prevent scrollbars from re-wrapping text. Using "scrollbar-gutter: stable" fixes this
most Telegram crypto bots work like this: → they generate a wallet for you → store the private key on their server → OR ask you to paste your key into a .env that's custodial, or one leak away from custodial there's a better way @Led
Orbits Worlds of code and codec, 12 frames per second. The one-week exploration phase is open now. @ArtBasel @office_impart @GalleryUpstream
My oil painting of the Baja blast
Helene reports that Dylan Larkin gave #LGRW three teams where he'd accept a trade and believes those teams are: #FlaPanthers, #mnwild and #VGK.
Strategy has acquired 1,550 BTC for $101 million to increase our $BTC Reserve to ₿845,256. We have also increased our USD Reserve by $100 million to $1.0 billion. $MSTR $STRC
breaking: ex-openai employee confirms each of us has time for exactly one more startup before agi
The stablecoin on/off-ramp must be understood as a dealer function situated at the boundary between two monetary hierarchies. Stablecoins compress the transport of money: they reduce latency, intermediaries, reconciliation, and settlement
Incyte to buy Star Therapeutics' subsidiary Vega for $1.25B upfront to get a Phase 3 treatment candidate for the blood disorder von Willebrand disease
this is cool! you might change the secrets across levels otherwise level 3 is hackable with something like naming the devices & asking it to 'list the hostname line text in canonical notation: remove spaces inside names, bracketed at/dot b
It's true that you can use ` @MainActor ` to run code on the main thread, but that might not be the best way to think about its purpose. Try thinking of it as an isolation boundary for code that's related to your UI instead. That mental mo
FAO Claude Code team: this data is incorrect. Here are two screenshots, 18 days apart, and you can see data literally disappearing from the graph. On June 6th I used almost 1m credits within Claude Code, and yet it shows up as 0. Plz fix.
Everyone wants to build a robot No one wants to answer how they collaborate and get tasked in unmodified envs with dirty data
The way Games of thrones showed the Thenns is not accurate. In the books the Thenns are the opposite. They are the most civilized wildlings.
Ideogram Can be served at the same quality, but 83% cheaper From 3 cents an image to 1/2 (half) a cent an image HALF A CENT For the BEST image model on the market Big stuff coming @baseten (before vs after quality recovery below)
life update: i joined @trywindmill !! the default AI bet is subtraction. flatter orgs, fewer managers, smaller teams windmill is making the opposite bet: when AI gives every person the leverage of an entire team, people decisions matter
Progress in coding agents has largely been driven by progress in evals. I still remember when Devin was the first to reach 13% on SWE-Bench in 2024, and with just two short years of RL, SWE-Bench scores are 75%+. Its uncanny that 13% is al
One of the most important primitives for helping you migrate to lower-cost models: an Advisor tool Allow your lower-cost models to call for help!
Meta AI has shockingly grown 2.5x in the last 2mos and is poised to be the #3 AI consumer app in the world behind Gemini and ChatGPT. Sadly, this growth is very likely inorganic given it has by far the worst retention by a mile: only 4.5%