AI-driven formal proof search for open Erdős problems
A Google DeepMind paper describes autonomous Lean-based proof search resolving open Erdős problems at per-problem costs of only a few hundred dollars
Chose one representative per repeated story where possible, especially around AI math results, agent tooling, and semiconductor rumors.
A Google DeepMind paper describes autonomous Lean-based proof search resolving open Erdős problems at per-problem costs of only a few hundred dollars
On iOS and macOS, WhatsApp chat databases can sit plaintext in a Meta app-group container, making same-developer app boundaries much weaker than users expect
TrapDoor shows active attackers coordinating malicious releases across npm, PyPI, and Crates.io rather than targeting a single package ecosystem
Microsoft’s proposed Nevada tariff would assign new data-center grid costs to the large customer that causes them, a concrete model for AI power load politics
DeltaBox attacks a systems bottleneck for agent tree search: cheap process and filesystem checkpointing without copying an entire sandbox every branch
Bun is using continuous parser fuzzing plus minimized repros routed to Claude, turning crash discovery into a mostly automated repair pipeline
Expanded FedWire access helps fintechs only if they can solve the hard part: funding and managing settlement accounts, not just connecting to the rail
A $10M exploit hit EURR and USDR through a 1-of-3 minting multisig, showing how stablecoin designs still collapse around key-management assumptions
vLLM’s fake-issue PR episode shows open-source maintainers now have to defend review time against resume-driven AI slop
The newly opened 86-DOS 1.00 source is the pre-IBM PC DOS codebase behind the compatibility lineage that shaped personal computing
The Marin team preregistered a large MoE training loss before launch and beat it, making scaling-law forecasting auditable instead of retrospective
China’s HTR-PM pebble-bed module moving toward 2027 commercial service is a live test of whether advanced nuclear can be built repeatably and economically
Vietnam’s export boom masks a development trap: manufacturing exports reach 90% of GDP while high-value components are still imported
SGLang OOMs are not one failure mode; serving memory can disappear into weights, KV cache, activations, fragmentation, queues, or scheduler choices
Pairing Hopper with an MCP server gives Apple engineers a practical workflow for interrogating private frameworks and dyld shared caches with code agents
Hyper experiments with a shadcn-style API framework that is copied into your repo, giving routes OpenAPI, typed clients, and MCP without a runtime dependency
A tiny DNS allowlist for in-flight WiFi portals solves a real travel problem for users with VPNs, DNS filters, and locked-down company devices
SendCutSend’s homegrown Toyota Production System shows modern manufacturing advantage coming from operational discipline, not just software wrappers
Milgrom-Roberts complementarity explains why Japanese firms historically combine low hierarchy, broad worker tasks, lifetime employment, and unrelated product lines
A SwiftUI-to-AppKit rewrite producing 3x speedups is a useful counterexample to simplistic framework debates: abstractions can be fast, but the escape hatch still matters
Conditional protein diffusion points generative modeling at protein design, where the artifact is a physical molecule rather than an image or text sample
PhysX-Omni packages sim-ready generation, datasets, and benchmarks for rigid, deformable, and articulated objects, addressing a missing layer in robotics data
TurboQuant+ shrinks KV cache memory 4.75x with 3-bit quantization across CUDA and Metal while preserving near-fp8 top-5 behavior
An xterm.js rendering fix is a reminder that terminal correctness still depends on unglamorous edge cases that affect thousands of developer tools
Framing code quality as CI and observability shifts the debate from aesthetics to whether failures, performance, and regressions are visible
Austin’s HOME initiative shows a concrete zoning change: allowing three homes on single-family lots can create family-sized infill without megaprojects
A simple SF webcam aggregator turns hyperlocal fog and microclimates into usable trip-planning data that forecasts often miss
Next up, Anthropic on SWE Bench Pro. This is where we see some bigger jumps rather than incremental. Opus 4.6 scored 53.4%, Opus 4.7 hit 64.3%, and Mythos Preview jumped to 77.8%. • Opus 4.6: 53.4% • Opus 4.7: 64.3% • Mythos Preview: 77.8
"Our most capable agent autonomously resolved 9 of 353 open Erdős problems at the per-problem cost of a few hundred dollars, proved 44/492 OEIS conjectures, and is being deployed in combinatorics, optimization, graph theory, algebraic geome
OMG! 97.8 TPS single 3090 qwen3.6 27b dense. (P.S. No overclocking yet) Recipe cooking at
Our inference stack, optimized for Blackwells, with a novel attention kernel and many new optimizations has started rolling out! It's already charting on Artificial Analysis, eg: #1 speed and latency for @Kimi_Moonshot Kimi 2.6. #1 on l
context-mode keeps raw tool output out of your AI agent's context window. 98% reduction · 15 platforms · 15.5K stars · 1.1K forks · No telemetry Used across teams at: Microsoft · Google · Meta · Amazon · IBM · NVIDIA · ByteDance · Stripe
One of the deepest realizations I’m having about systems/GPU/distributed computing: "Hardware does NOT create parallelism." It only consumes parallelism that already exists in the data/workload. That changes everything. Parallelism fund
When SGLang OOMs, What Exactly Runs Out of Memory? Around two months into fully developing SGLang Omni, roughly this April, we got a brand-new H100 on top of the H200 development machine and the H20 CI machine we already had. That meant one
Happy to share our new ICML oral in Pretraining: OPUS! It tells: which tokens should the model train on at each step? Instead of static data filtering, OPUS dynamically selects tokens based on the optimizer-shaped updates. Less “more d
This. Understand the code you're responsible for. "My AI agents one-shotted this app overnight and I didn't read any code and just shipped it" is not impressive; it's irresponsible.
Александр Рыжков (Kaggle Grandmaster, LightAutoML team lead) published a comprehensive Russian-language DVC tutorial covering data versioning, S3 & Google Drive backends, DAG pipelines, and when NOT to use DVC. https:// hubs.la/Q04fRcSy
yeah that's it: each link is it's own DO for: - upload state - capability auth - per-share limits - expiry/cleanup lifecycle - concurrency control R2 holds the file bytes. There is also a separate DO for deployment-wide quota limits
slime Source Code Walkthrough: SGLang-Native Inference Architecture (Part II) Evaluation & insights from Zhihu contributor 鲸饮未吞海 Inference Data Plane Chapter 2 explained how the system is launched. Now we switch perspective: after every
[1/4] The human eye doesn't process every single pixel of a video continuously—it focuses on what changes. So why are our video AI models wasting compute on redundant frames? Introducing Swift Sampling: a test-time technique inspired by t
If you have an Nvidia RTX 4090 --ddtree-budget 36 is the best configuration that buys you 2.5x speed up during decoding for Qwen3.6_27B. Thanks for the benchmark https:// github.com/1TommyCheung
New NanoGPT Speedrun WR at 81.8 (-2.6s) from @.Lisennlp on Github with MUDD skip connections, an expressive and efficient mechanism for data dependent skips! Instead of a learned scalar or sigmoid(linear) gate, MUDD uses a 64 neuron 'MLP' t
Qwen inference team is super great — they achieved 540 TPS on TokenSpeed for agentic workloads Looking forward to them sharing more optimization details soon. Stay tuned. https:// github.com/lightseekorg/t okenspeed … Enjoy!
New: Traces for Mixedbread agentic search See every search call an agent makes directly in the dashboard, and tune instructions for better retrieval quality.
New in emulate v0.6: way more Slack Emulate Slack in CI + agent sandboxes CLI or Next.js adapter Messages Threads Channels DMs MPIMs OAuth Inspector SDK tests Scopes Profiles Presence Files Uploads Pins Bookmarks Apps Modals Webhooks Eve
Workbench - open-source BullMQ dashboard, drop-in for any Node backend. Flows, metrics, schedulers, search. MIT. Link
Think ⨉ Skills --- https:// github.com/cloudflare/age nts/pull/1584 … - supports the https:// agentskills.io spec - load via local filesystem/codebase, or r2 (git coming soon? maybe) - configurable permission model - working js/python/ba
@saturdayrobotic & World Model Reading Club 09, Part 1 Recap, @CVPR Warm-up: keynote @tommiekerssies , hosts @junfanzhu98 , @aurorafeng_01 , @zoeytzh A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tok
What's the secret sauce behind leaning attractors? Stable & amortized optimization. Here is the controlled study of how we built EqR!
How did Tensor Cores massively increase the throughput of Nvidia chips? @reinerpope explains the fundamental idea, systolic arrays:
There is now a smarter way to pick data for training LLMs! Enter OPUS! This is an ICML Oral paper from SJTU, Alibaba, UW–Madison, UIUC, and Mila - Quebec AI Institute. The proposed method dynamically and intelligently selects the most im
Fast browser agents are getting cheap. Composer 2.5 in Pi drove Sauce Demo checkout in a headed browser and placed the demo order. 39s end-to-end 28 browser tool calls ~7.7k Pi-estimated model tokens pi-agent-browser-native v0.2.33 What
>You pay per token, in & out. Input tokens are cheaper if they’re cached, but it costs to write them to a cache, and the cache expires when we decide. If you add an image, it may invalidate the cache. We decide how many tokens to output, an
the mental model that finally made agents click: a stateless reducer on a stream
I’ve left Google DeepMind. The last two years have been an incredible whirlwind. A couple years ago, I joined a small startup called Codeium. There, I got to ship Windsurf, train SWE-1 (a frontier agentic coding model), go to DeepMind in
NYC weekends really do feel rainier... I ran more data and built a full data investigation to see if the pattern is real. Across 2,192 days (6yrs now!) of weather data, NYC rain rates by day of week look like this: Mon: 27.8% Tue: 32.9% W
Paper here: https:// arxiv.org/pdf/2502.12170. The MUDD coefficients are used for many purposes, such as routing multiple layers into future attention values, modulating the value embedding, modulating the bigram embedding, etc. (Delay on
Send help! The first part of my blog seires on "how to profile like a noob" is going out of bounds now. I am here explaining why there was a cudaOccupancy runtime call before cudaLaunch for `aten:mm` and not `aten:add` This is fascinating
I got bombarded with fraudulent orders. Recently someone (competitor) hired some BH service that puts in fake orders using a bunch of different credit cards. Then they chargeback and get your SP banned. Best way to avoid this is to setup
I have a stacktrace right here. This is stable diffusion in pytorch, right after flash-attention was updated. The only difference between clean wholesome image generation and this compute-sanitizer IMA was that flash-attention upgrade remov
A lot of automated crash reports on Windows have anti-cheat DLLs in the stack trace. Very hard for us to do anything about it. Chrome has a hardcoded list of these where they basically ignore crash reports from.
Pet peeve: "I value correctness over performance". It sounds kinda true (FKs, ACID, and writing to something other than /dev/null can be slower). But when the type of system you are building is known (e.g. Postgres), the tradeoff is bo
“Mechanistic Data Attribution: Tracing the Training Origins of Interpretable LLM Units” One of the papers I personally found very promising earlier this year just got accepted as an ICML Oral. huge congrats to the authors The core objec
new aithy website https:// github.com/dosco/aithy - better secure sandbox - artifact management - tons of built in skills - new multi-service arch., - packaged deployer - mesh networking - usage tracking - permissions system - memories -
minecraft lobby from first principles
After our Perception Tokens paper, we asked: are models truly reasoning over perception tokens or do they just benefit from extra reasoning budget? And why discrete tokens instead of richer continuous ones? With @JackZhang970191 we answer
Scheduled Tasks for Project Think https:// github.com/cloudflare/age nts/pull/1585 … - use cron patterns or a DSL - run a prompt (or regular code, coming soon) - ... that's it. simple.
added a new exposure catalog to bumblebee https:// github.com/perplexityai/b umblebee …
deepseek is the model to watch. amazing value for what it does for batch work. also @FireworksAI_HQ your pricing needs to be updated. 4x more expensive than milking the tokens direct from the creature.
i was honestly surprised when i saw the qwen 3.5 4b working with ax agent in aithy it was writing javascript code in a repl and bash code in a sandbox to do stuff. and all of this running on my m1 pro.
Be very careful clicking on any links, etc. in the official Cloudflare Discord server right now. Crypto scams and more remain unmoderated for days sometimes. Cloudflare disbanded their Community Champs program - folks who moderated the ser
For all you terminal nerds, a quick sneak peek at a QoL improvement shipping in the next version of @evo__hq You can now track your autoresearch runs right inside your terminal, no need to open the dashboard (which evo already ships wit
This paper reframes data efficiency as Data Value Density: more training effect per unit of data, through selection, removal, scheduling, mixing, augmentation, distillation, and evolution. From a Deep Manifold lens, DVD is boundary-conditi
I really want to learn more about inference engineering, projects like @sgl_project , @vllm_project really excites me trying to push the limits of a gpu. Really want to be good enough to contribute to these OSS projects. Starting today
Check RACO, accepted as an 𝗢𝗿𝗮𝗹 paper to #ICML2026 (𝗧𝗼𝗽 𝟬.𝟳%) we propose a new conflict-averse optimization scheme for LLM multi-objective finetuning, with counterintuitive theoretical acceleration and better empirical pareto fron
Tired of spending weeks of SFM reconstruction? Try VGGT-horseshoe Plugin in @lichtfeldstudio to get pose and dense point + sky segmentation in 20 seconds.
assuming 100,000 human hours spent thinking about this problem (napkin) worldwide weighted average of $20/hr for maths research, that $2,000,000 the cost of frontier maths research just fell at least 2,000-fold.
does anyone at oai know the time it took human researchers to verify the proof
Control strategy for large scale bio datasets serving as substrate for AI is its own science (and one I’d suggest we played a very meaningful role in pioneering). Can’t agree more with Ron. If you see controls in a row, column or on the ed
It doesn't have to be a choice between fast-and-fragile or slow-but-secure but if you have an application you care about you need infra building blocks that guide your project towards simplicity. Systems design is more important than ever.
update — released blindcache v0.2 with semantic search. embeddings happen locally via Xenova/all-MiniLM-L6-v2 in-process, so your text never leaves the SDK to be embedded. mem0, letta, zep all send your plaintext to openai's embedding api
Maple gives you a lot of different options to visualize and explore a trace for any use case. Debugging an error? Use the waterfall. Debugging perfomance issues? Use the timeline. Exploring the executing order or debugging retries? Use th
Ever wished you could adapt your EP size on the fly for fault tolerance of scaling purposes? Now you can with NIXL-EP! Check it out:
The International Economic Review has just published a wonderful paper by the late Marcus Hagedorn (with special thanks to Iourii Manovskii) on A Demand Theory of the Price Level. It is available through @WileyEconomics here:
introducing >molly a terminal-native discord client discord is slow, heavy, and resource hungry - a browser wrapped in electron, burning your ram just to chat. molly fixes that open your terminal, type molly, and you're in discord. no br
My newest gbrain-evals just dropped - this is how gbrain does vs other options. http:// ZeroEntropy.dev is SOTA for reranking and embedding cost, speed, and retrieval success. GBrain beats MemPalace by 1% on LongMemEval and beats Vector R
Data synthesis is not only about generating more data, but about generating the right data efficiently. How to synthesize effective and diverse training data in a cost-efficient way is a key question for LLM improvement. This work provides
A way we have started framing this lately in our team is that all tech debt fixes must be aimed to make the agents better. Everything else is a waste.
Excited to share that our paper “Closed-Form Concept Erasure via Double Projections” has been accepted to #CVPR2026! As generative models become increasingly powerful, concept erasure is emerging as an important problem for trustworthy and
In the agentic economy, once agents start moving value, I believe the most interesting question is: which infra the agents run on? Rails decide everything here & I think it will condense to a core set of requirments: - *Throughput* is the