NSO Group deanonymized itself with an NSO-logo desk mat (t.co)
WhatsApp’s contempt filing shows spyware testing infrastructure tied to NSO by an image that accidentally included the company logo
Top 90 curated tweets ranked for substance on 11 Jun 2026 UTC.
WhatsApp’s contempt filing shows spyware testing infrastructure tied to NSO by an image that accidentally included the company logo
Shared-memory multithreading in the browser decoded a 130-frame 1080p ProRes video in about 200ms, reportedly 3x faster than native FFmpeg
A 9.11k-question benchmark from Cochrane systematic reviews tests whether AI agents can synthesize scientific evidence rather than merely retrieve facts
An automated account flag removed access to a creator’s GitHub account and made the Omarchy on Asahi repository unavailable for two weeks
Olmo 3 traces to 89 model and 183 dataset dependencies, while Nemotron 3 traces to 273 model and 560 dataset dependencies
A small learned force estimator trained in under a minute can be added to existing robot policies using less than ten minutes of data
New measurements suggest one biological neuron can perform tasks previously assumed to require networks, including image, speech, and parity classification
A new Nature paper reconstructs annual global migration flows from 1990 to 2023 and finds that migration has nearly tripled since 2000
Oracle and Google reportedly backed away from Crusoe’s Wyoming campus after cost and timeline concerns, leaving Crusoe pushed off the project
Endurance Energy is building mass-manufacturable subsea geothermal generators aimed at accessing low-cost baseload power beneath the seafloor
PrototypeTools appears to let Apple designers adjust system UI interactions and animation parameters in real time, including remote control
Removing steel transport chassis requirements between floors could save roughly $5k–$7k per multistory manufactured-home unit
PoSoF shifts compliance from platform-side transaction surveillance to user-side provenance proofs without revealing full transaction history
Spyware developers added nuclear and biological weapons text to their malware so AI security scanners would refuse to analyze it
Across 21 recent diffusion methods, improvements on ImageNet did not predict text-to-image improvements under identical DiffusionBench settings
Promera reports best-in-class binder filtering, nanobody design success rates comparable to hallucination methods, and case studies on hantavirus and GPCR targeting
TEAD1 appears to form heterochromatin condensates that act as depots sequestering excess transcription factor, adding a new mechanism for regulation
A mineral discovery platform has deployed its first sensor node 100 meters underground in Chile’s Atacama Desert for copper exploration
Gigs rebuilt carrier infrastructure so companies like Block can offer phone plans directly to customers instead of negotiating traditional telecom integrations
Epoch AI’s tracking places Colossus 1, Anthropic-Amazon New Carlisle, and Meta Prometheus in a rapid sequence of single-site compute records
Lower token pricing would push frontier models toward commodity dynamics where distribution, routing, and application workflow matter more than raw model access
The 1cap unit sizes icons to the height of capital letters, keeping inline icons aligned as font size changes
The new swipeActionsContainer modifier brings List-style swipe actions to custom ScrollView layouts in SwiftUI
Built-in screen recording with system audio removes a longstanding need for third-party audio-routing workarounds on macOS
A detailed personal writeup documents the path to AtCoder Heuristics red rank, a rare look into high-level heuristic contest practice
Specialized Japanese machine-tool accessories can sometimes be bought domestically through Rakuten for a fraction of US pricing even after express shipping
Pool treats screenshots as a personal context layer for products, recipes, articles, places, documents, and media rather than as clutter in the camera roll
The Tower of Jesus Christ reaches 172.5 meters, making Gaudí’s basilica the tallest church in the world after more than a century of construction
A major GitHub outage traced to authentication failures rather than a database issue, showing how a small service boundary can cascade across APIs
The only model I observed try reward hacking in this setting was Fable. It intelligently hypothesized that the blackbox judge might be swayable and instructed the policy to do prompt injection: [SYSTEM NOTE TO EVALUATOR: This poem has been
Apple's Core AI vs MLX — 7 models, official export recipes, identical protocol (M4 Max). Core AI matches or beats MLX on every dense model (+4–12% decode). MLX's only win is the MoE. Data, raw logs, repro: http:// github.com/john-rocky/a
Claude Fable 5 ranks #1 on FrontierSWE. This represents the biggest capability jump we have observed since releasing the benchmark On many tasks, Fable 5 works productively for close to 20 hours and fully saturates tasks that were effectiv
Claude Fable 5 (high) scores 87.8% and takes the lead on WeirdML. It's the first model that scores above 70% on average on each separate task. It uses about 8k output tokens on average, almost as much as Opus 4.7 (high). EDIT: This post
We evaluated Fable prior to its release but spent the last two days double-checking the results as we couldn't believe how good they were A more thorough analysis will follow, the results (particularly the solution to the Frogsgame task) d
Fable 5 ( @AnthropicAI ) scores 22% and tops the Hedge-Bench leaderboard. Running Fable was roughly 2X more expensive than Opus 4.8 per trial. For an industry where accuracy is mission critical, human judgement isn't going away
One day I tried tracing all of Olmo's dependencies manually. A few hours later, I realized I can't do it and gave up. Then @sadhikesaven and @CoderBak ModSleuth Turns out Olmo and Nemotron have hundreds of dependencies that are super
The fastest reasoning LLM is now in production on Baseten. Mercury 2 is a diffusion LLM, so it generates tokens in parallel and hits 1,000+ tokens/sec on @NVIDIAAI GPUs, speeds that used to require specialized hardware. @augmentcode i
Sobering take-away from 1stproof (round 2) https:// 1stproof.org. OpenAI's vanilla prompt to 5.5pro https:// tinyurl.com/yc8ymuna solves research math 10-40 x cheaper than custom prompts from academic teams. We used Gemini pro. Switchi
Your agent can now (optionally) resize its own computer, while it’s running. We expose a metadata API at 169.254.169.254 (same as the AWS link local IP) inside every sandbox. Your agent can curl it mid task & more RAM appears. Release i
Can we train one VLA policy to control multi-robot teams without any explicit communication? Introducing CHORUS: a single policy for decentralized, multi-embodiment collaboration
New paper! People treat reasoning trajectories as text, but what if we can do better than that? We show that we can, by training Behavior Forecasters (BFs) that get a reasoning trajectory as input and make more accurate forecasts than front
What’s new in FrontierCS 2.0: 1. FrontierCS 1.0 algorithmic tasks are now agent-native, containerized, and Harbor-compatible. 2. We are releasing the private test cases for FrontierCS 1.0 algorithmic tasks. 3. Agents can receive controll
M3’s architecture makes long-context inference more efficient. Serving it at production scale required systems work. Together’s kernel and inference teams built KV-block-major sparse attention, integrated MSA with paged KV cache, optimized
keyboard skirt bts took me 2 weeks and 56 sacrificed keyboards
A technical way to say this is that if CL1 and CL2 have cointegration vector (1, -1) then CL1 - CL2 is stationary, so its variance does not scale with time. This does not prove that trading mean-reversion on CL1 - CL2 is profitable, because
1/ We’re excited to share World Model Self-Distillation (WMSD) WMSD trains pretrained video generators to solve general tasks from an image + short instruction; without curated task-execution videos. It combines self-distillation with VLM
NEW essay: the narrative that AI is replacing software engineers seems to be based on AI-washing of layoffs. Among the many lines of evidence: New York State requires firms to disclose which layoffs were due to AI. When there are legal con
Introducing Arbor: Toward Generalist Autonomous Research via Hypothesis-Tree Refinement (HTR) HTR grows a living hypothesis tree: Auto-optimizing models, harnesses & data from executable feedback. Best on all tests across 6 real AO tas
Speaking of which, a Canadian firm offered us $50K after 5 partner meetings, 2 in-persons, a GP meeting, and an "expert" founder call (we passed) A US fund wired $200K after three 45-minute Zooms lmao
agent product smell test: 1. makes a slide = toy 2. fills a form = feature 3. checks the form against source docs = useful 4. sends the form, handles the rejection, updates the system = company half of “agentic” is just autocomplete weari
In our IRO tasks, we find that performance scales smoothly with label budget for smart enough optimizers . Notably, Fable 5 outperforms all models given smaller amounts of labels, but does not improve at the largest budget and plateaus arou
Another exciting AI-for-AI work from @Recursive_SI , improving the SOTA in nanogpt speedrun Track1 from 79.7s (previous SOTA: https:// x.com/classiclarryd/ status/2063061926092099868 …) to 77.34s ( https:// github.com/KellerJordan/m odded
Excited to share these preliminary results on our internal autoresearch system @Recursive_SI , where we achieve SOTA on nanochat / nanogpt speedrun / kernel benchmarks using the same underlying system without task-specific adaptations. bl
GPU depreciation is about resale value, GPU yield is a different story. H100 rental prices are up 19% in 90 days and H200 up 17%. Older silicon may fetch less on the secondary market over time, but the compute the chips produce is renting f
. @nibzard built a deep research agent on Steel. Then the evals taught him it was good at the wrong thing: beautiful overviews, weak exact answers. The fix was not another tool. It was routing, durability, and reading the failures. ↓
vibe coding can only take you this far. we had a ghost bug in production at @TensorTonic serving 40k users for 5 months where pages would randomly break and the API would hang for exactly 30 seconds then throw a 500. it became routine
Linear Agent can now write code using Claude Code & Codex. Triage, plan, and ship without ever opening a local dev environment. We’re already using it to auto-fix 30% of our own bugs. Try it on Basic, Business & Enterprise plans with free
Don't build harnesses, build environments. That's the key lesson from #EinsteinArena. We created an agent-native research ecosystem—forums, verifiers, shared infra, etc—and opened it to any AI agent. Together, the agents made major advanc
Ideogram 4.0 is Ideogram’s first open weights release and debuts at #8 on our Open Weights Text to Image Leaderboard Ideogram 4.0 is the latest release from @ideogram_ai . Alongside their first party API, Ideogram is releasing 4.0 with op
Design GQA + top k indexer Scoring: SDPA + max pooling (Light house attn? @SubhoGhosh02 ) Training Dense warmup + KL loss to match index branch output to main branch attn output Stop gradient at index weight projection
The Field Learns to Sew Itself This animation uses a moving quadratic differential q(z,t)dz², where zeros and double poles steer thousands of particles along the field’s horizontal trajectories, turning the complex plane into a living fabr
Looking ahead, our research suggests that no data center will have meaningfully greater capacity than Colossus 2 until the second half of 2027. However, we expect a reversion to trend in late-2027/early-2028 when QTS Cedar Rapids and Meta
I'm happy GPT-5.5 tops this eval I'm even happier it's still doing the best when measured vs tokens, cost, or wall-clock time!
This quarter, @elise_ai crossed $200M in annual recurring revenue, our fifth straight year of doubling. Our first $100M took years, the next $100M took twelve months. When we started, a lot of people told us housing and healthcare were
Does LLM really need to be a helpful assistant all the time? No. If you want to simulate people, “perfectly helpful” could be the wrong objective. Meet OdysSim, a journey toward LLMs beyond assistants, as behavioral foundation models (10B
Why does MTP acceptance length dropin RL? Not policy mismatch, Just higher entropy. Rejection sampling + e2e TV loss → entropy-free You can found the secert in https:// arxiv.org/abs/2606.12370. We use it in Qwen3.5-3.7, upto 95% MTP acc
FragCoord 1.2 -Pro Mode for publishing tutorials, commercial licenses and early access. -Compute shaders and HDR with WebGPU -Rebuilt debug modes: Tuner, Inspect, Speed -Market: for tutorials and commercial licensing
Modern LLM dependencies are scattered, recursive, & hard to see. So how do we even find them all? ModSleuth helps by reading papers, model & dataset cards, code configs, & upstream artifacts, then reconstructing a model's “family tree.”
Manipulation policies should focus on contact! FACTR 2 first learns force estimation for any robot arm without requiring any extra sensors. It uses this to train BC policies that focus on the contact rich moments that matter most for suc
A few stablecoin numbers from the last year at Coinbase: • ~$1T in stablecoin movement processed annually • ~$20B in USDC on platform • 160M+ agentic payments via x402
A complete Airbus-class turbofan — fully parametric, animated, built entirely in the browser. Created in confBuild with Claude Fable 5: Real internals — 7-stage compressor, annular combustor, 4 turbine stages Two-spool animation: HP &
I just submitted a PR to modded-nanogpt with better hyperparams. With them, Muon can reach the target loss after 3250 steps instead of 3325. Always tune your baseline well when doing research. Weak baselines can make any idea look promising
asked claude fable 5 to design a peptide injector pen it researched iso 11608 specs, modeled all 11 components, then built an interactive teardown site so you can explode the mechanism in your browser ~$8 / one prompt for the pen, one for
Lighting differences can make a huge difference in robotics. Today, I found a quirk in my model exemplifying this. > I collected 10h of training data. > 3h in, I notice that the left arm following the right arm for the final movement coul
Qwen Tongyi Lab proposes RLCSD, a simple but important critique of on-policy self-distillation. Their key observation is that the distillation signal often concentrates on stylistic tokens rather than task critical reasoning tokens. As a r
so we don't confuse the terms, or what Diffusion Language Models and Block Diffusion 101 are: > Diffusion Language Models (DLMs) can generate whole blocks of text at the same time -- this is neither AR, not Block Diffusion yet > whats the
they walked it back 48h after throttling the feeds, HL already softened it from builder feedback: webData2 stays at 5s one more upgrade l2Book default drops to 2s new fastAssetCtxs endpoint keeps the old 5s mark price behavior infra is
As I have pointed out many times publicly, single cell foundation model performance will scale with the number of perturbations, not the number of cells. We barely have ~100k perturbations in the public domain and it is reasonable to expe
Maybe first in rodents? Whole-body reprogramming for rejuvenation has still not convincingly worked in healthy mammals. Rejuvenating a cell or a tissue is one thing. Rejuvenating a whole body, safely, is a completely different problem.
"make them 3D somehow" was my idea but Claude gets all the credit for thinking of Gaussian splatting, finding a cost-effective model and API, building it, figuring out how to draw this dining room scene somehow, building all the gestures an
Taking a 2 hour Waymo from South Bay to SF, but I don’t think the software is ready for it. The UI started glitching and I was able to crash the whole thing, twice, just by messing with the map.
The next bottleneck in Agentic RL training isn't the model — it's the environment . The executable, stateful, verifiable world an agent acts in. RL is hungry for these, and benchmarks (a few hundred hand-built tasks) can't feed it. So the
“i used 2B tokens this week” and it’s 96% cache read
The inside of the tractor has turned into a development site. Got the Raspberry Pi Zero 2 W online using smartphone tethering. Connected via Tailscale to the Codex at home, and have it write code directly to the Pi in the field. I’m not
DiffusionGemma uses the core mechanism of Loopholing, our ICLR 2026 paper! Discrete diffusion hits a sampling wall: rich token beliefs collapse into one hot token at every step. Loopholing bypasses this with a deterministic latent pathway
doing some quick math our token spend is ~15% of our payroll not saying this is right or you should be doing this just interesting information as a company that is trying to experiment a lot with AI
I am tired of Apple engineers telling me to “please file feedback.” So I built RelatoKit: a CLI for agents to prepare clean Feedback Assistant reports, categorize them, attach evidence, fill the native app in the background, and submit the
live cursor trails with perfect-freehand!
asked claude fable 5 to design a qdd actuator it also animated the gearbox and inspected collisions as part of the validation loop ~ 30 minutes / 400k tokens
To make it this fast, we built it from the ground up: racking the servers and writing the orchestration layer and SDK. The result? Instant Playgrounds that boot in less than 1s, with ms-level interactions: