Biased RSA keys factored via polynomial factorization
RSA keys biased toward zero bits turned integer factorization into an easier polynomial factorization problem, exposing hundreds of real-world keys tied to a patched CompleteFTP bug
Balanced the very strong AI/model news against security, policy, biology, graphics, robotics, markets, and durable builder artifacts to avoid an agent-only slate
RSA keys biased toward zero bits turned integer factorization into an easier polynomial factorization problem, exposing hundreds of real-world keys tied to a patched CompleteFTP bug
An attacker adopted orphaned AUR packages and inserted infostealer malware and a rootkit, making old community-maintained packages an immediate audit target
The final unsolved section of the CIA’s Kryptos sculpture became a new public cryptography challenge after the solution was bought from the artist and re-encrypted
A Big Four report praising business AI adoption used fabricated case studies, turning hallucination from an abstract risk into an institutional failure mode
A browser CAD system can now switch between CPU WASM and GPU WebGPU execution for dual contouring and marching cubes, giving interactive geometry tools a major local performance path
MiniMax released a 428B-parameter open-weight model with 23B active parameters and MiniMax Sparse Attention for million-token contexts
Musk raised roughly $10B in primary SpaceX equity while retaining top-decile founder ownership, showing how capital efficiency can compound into unusually low dilution
Epoch’s audit corrected errors in 42% of FrontierMath Tiers 1–4 problems, raising scores while leaving rankings broadly similar
A maritime law framed around security resilience and industrial self-sufficiency has produced a tiny defense-irrelevant shipbuilding base and higher goods costs
World Tracing generates complete 3D geometry from a single image while preserving a trace from every 3D point back to its source pixel
Playbit is bringing its app runtime to browsers with both the runtime and apps running as WASM, making the browser act more like a portable OS layer
A 22-degree-of-freedom humanoid controlled a six-link pendulum attached to its torso using only its own body actuation, without a sweep and with first-seed success
River’s largest revision adds 60% faster throughput, 20x faster backlog draining, arbitrary out-of-band signal waits, timers, and CEL wait expressions
MIT Press made a full autonomous robotics textbook freely available, covering mechanisms, kinematics, sensors, planning, localization, vision, and neural networks
A new cryoEM imaging technique improves the raw measurement layer for cellular biology, which changes what researchers can directly observe before modeling begins
Two Sigma found that one model trained across hundreds of stocks beat separate models trained per stock, suggesting useful shared structure in high-frequency market data
A balcony microphone identifies passing birds and updates a framed color e-ink display with a daily collage of the species heard nearby
PhyCo lets a video generation model take friction, restitution, deformation, and force as inputs instead of leaving the physical behavior of a scene implicit
Creative time density names the awe generated when hundreds or thousands of person-hours are compressed into a specific artifact, performance, place, or experience
Compilecat applies closure-style optimizations such as inlining, loop unrolling, scalar replacement, and constant inlining under explicit annotations, making one Chrome example about 4x faster
A compact autonomous laundry-folding robot with onboard private compute, apartment-friendly dimensions, and human-quality folds is being offered for $1,499 in the US and Canada
Google Earth brought flight simulator to the web alongside desktop-grade features like elevation profiles and expanded import support
US-regulated gold and silver futures are moving to continuous weekend trading on Coinbase, with oil and other contracts planned next
Kimi’s open coding model adds long multi-step coding gains and lower reasoning-token use while keeping the K2.5/K2.6 architecture deployable in existing SGLang setups
AgentKeys cap minting now requires a request-bound client signature from the agent-held device key, so a broker can coordinate but not unilaterally authorize the worker
Unifying Apple OS version numbers reduces the boilerplate and mismatch risk in Swift availability checks across iOS, macOS, watchOS, tvOS, and visionOS
The Skyline Project offers interactive open-source maps of NYC, San Francisco, and London where buildings can be searched by history, architecture, and year
Cheap monitors contain diffusers, polarizers, prismatic sheets, and high-albedo matte panels that cost far more when bought individually from optical suppliers
SCALE-CLIP enables direct comparison of endogenous RNA-binding protein binding across many factors and links large-scale binding maps to splicing analysis
A compact Haskell implementation of 2048 fits the whole game into fewer than 90 lines and comes with an explanatory article
Result #32: @mihai673 has achieved a 30-step improvement over the old 2026/05/09 record by adding a SODA (Pethick et al. 2026)-style anchor towards init. It is unknown whether this technique can also improve the current record. 2/5
New record on GB300 NVL72: SGLang exceeds 12K tok/s per GPU on DeepSeek V4 Pro 1.6T (FP4, 8K/1K), orchestrated with NVIDIA Dynamo (SGLang) and MTP. Per @SemiAnalysis_ InferenceX benchmarks, performance stays strong across the entire in
Five recent notable Modded-NanoGPT optimization results: Result #31: Kai Lion and Florian Hübler have improved their Muown-based run from 3075 to 2995 steps by adding NorMuon & ContraMuon modifications. 1/5
Tokenminning: Token⋅Min⋅ning Get the *same quality* work done in the *same time* as your tokenmaxxing peers but with the LEAST amount of tokens Tokenmaxxing is too easy to hack (just run things in loop, in parallel, etc.) What are some g
I confess: you can't dynamically resize a @modal sandbox. Because you don't have to Sandbox workloads are spiky: install, wait, spike, wait We built our runtime to be *burstable*. Request the min & burst above it when your workload s
DiffusionGemma can now run at 2000+ tokens/sec! We made local DiffusionGemma inference 1.8× faster. Run it on 18GB RAM via Unsloth Studio. GitHub: https:// github.com/unslothai/unsl oth … Guide: https:// unsloth.ai/docs/models/di ffus
This data is wrong right now. Artemis are likely using time and sales files from Polymarket US which are currently overcounting World Cup volume by a factor of 100x. This is an error on Polymarket's side. June 10th Poly US did 79M ish. Int
implemented q-chunking on top of it offline only for now already converges significantly faster: 84% at 50k steps vs 56% for vanilla fql online fine-tuning + harder envs coming next
This Manhattan pixel map was the biggest pain in the ███ with the best payoff: a 4 x 8 ft wool blanket I have a whole new respect for pixel artists, they actually have to sweat details. I usually map big and shrink so the visual-overwhelm
over the course of adding features to this app, fable found one difficult. it turns out a certain apple API for programmatically moving windows between spaces silently stopped working 2 years ago. it found extensive discussion about this an
I created this new speedrun track, which compares results in terms of steps rather than wallclock, specifically to give a fair chance to optimizers other than Muon. Happy to see the resulting accumulation of public knowledge!
MiniMax M3, Open-Weight, Now On Hugging Face Weights: https:// huggingface.co/MiniMaxAI/Mini Max-M3 … MiniMax Sparse Attention: https:// huggingface.co/papers/2606.13 392 …
Now that I have your attention by posting this spinning point cloud GIF, I'd like to propose a litmus test for AI mechanistic interpretability research. You might call it the "interp hammer" test. If the things achieved by a mechanistic in
#つぶやきGLSL float i,e,R,s;vec3 q,p,d=vec3(FC.xy/r*.8-vec2(.4,-.6),1);for(q.zy--;i++<80.;){o.rgb+=hsv(q.z,.5,min(e*s-.3,1.)/35.);s=5.;p=q+=d*e*R*.3;p=vec3(log(R=length(p*1.3)),exp2(-p.z/R),atan(p.y,p.x)-t*.3);for(e=--p.y;s<1e3;s+=s)e+=cos(dot(
Two insights from LeapAlign: 1. Gradient descent, rather than GRPO, is native to diffusion post-training. 2. Early generation steps should be trained, such that image layout can be better optimized. Thanks @hillbig for posting this work.
GPT-5.5-xhigh's FrontierMath 4 score jumped from 35% to 73% after EpochAI fixed errors in the benchmark
We’re extending Harvey's Legal Agent Benchmark (LAB) to in-house contracting. Contracting is the highest-volume workstream for in-house legal teams, where a huge amount of business risk gets negotiated into binding agreements. Benchmarki
Introducing Gemini-SQL2, our breakthrough text-to-SQL capability powered by Gemini 3.1 Pro! We've achieved state-of-the-art results on the highly competitive BIRD benchmark, translating natural language into execution-ready SQL queries.
Big updates for InferenceBench v1.0.1! Some highlights: - 10 more entries to the leaderboard, including Fable 5, Opus 4.8, Kimi 2.6, and Gemini 3.5 Flash - Re-scoring / Re-evaluation of select models See the changes for yourself at: htt
Two steps to SOTA-level depth estimation: a strong T2I model + a simple post-training recipe. No bespoke depth architectures or complex pipelines needed -- the 3D understanding is already in the prior. Fantastic work led by @BDuisterhof !
Claude 5 Fable (Ultracode) "Make a playable alpine glacial valley at sunrise" No meshes or models. Everything you see is math. Fable screenshotted its own work and iterated. Took ~30 mins, ~500k tokens, ~2500 lines of code, and ~$25. Ext
I had a lot of Fable tokens to use up before my weekly reset, so I made this live 3D map of London with Three.js Every train, bus, boat and plane is real and live right now! - Tube, bus and riverboat data from TfL - National Rail trains f
While everyone talks about Mythos vs GPT-5.5, we've tested other near SOTA models on our ErdosBench. Smoke test on 14 problems with 7 models: Kimi K2.6, Gemini 3.1 Pro, GLM 5.1, MiniMax M3, DeepSeek V4 Pro, Nemotron 3 Ultra and Gemma 3 27
PROJECT NULLFRAME: A live telemetry dashboard brought to life with Fable 5 - in Nothing's design language. Your real fps, battery + network; your cursor becomes a seismograph. It tells you when it's simulating. Prompt inspo og @dominikmar
Very excited to see that the core idea of DiffusionGemma directly stems from our work, Residual Context Diffusion (arXiv:2601.22954)! Code- and architecture-level comparisons are attached. RCD is accepted to ICML 2026! See you in Seoul!
You can pick up a baby bird and put it back in the nest. The parents won't smell your hands and abandon it. Songbirds barely smell anything, and that old "the mother will reject it" line is one of the most worst wildlife myths going. If yo
Context Arena: Added @AnthropicAI 's Claude Opus 4.8 on 8-needle GDM-MRCRv2. Thanks @OpenRouter for the credits to run Opus 4.8 @ max. All results at: https:// contextarena.ai Opus 4.8 (max reasoning) lands #2 on AUC@128k, behind only
I'm rate limiting the overall amount of billable ledger entries, so there's an effective max billing per minute and per hour. If we see spamming we don't bill the advertisers. Trying to be as fair as possible. It's fair play, but don't ru
A beautiful "software factory" with its own "software byproducts". As Fable generates 100% Cloudflare IaC coverage, it also produces a perfectly patched API spec and Effect SDK. All important errors and fixed data types are discovered fr
The performance win is pretty huge. Even for really simple shapes, just being able to run dual contouring in a massively parallel environment gives like a ~3x speed improvement.
Built a browser tool called Blobtrack in @cursor_ai so I don’t have to hand-keyframe surveillance boxes in After Effects for hours. Just drag your box and move frames with arrow keys. Added a few fun fx as well.
what you get from STV+ViL can be thought of as a "transmutation" of OPSD's privileged self-teacher supervision -- instead of distribution matching, you purify and reify it into ICL via corrective verbal feedback.
Red sprites from last night. Last night was crazy. I captured over 60 individual sprite events. Captured from southern Minnesota looking toward the storm over Missouri / Illinois.
at amazon in 2023 they rejected my text to SQL tool with Claude 2 because it had a 70% success rate and needed to be 90% to deploy to prod
People frequently ask me how many tasks a benchmark should have. There's no exact answer but here's my intuition- (tl;dr aim for 300-500 tasks)
long-horizon coding is the future
So it begins - Stylocard V2 is in the works Specs: - 20 key stylophone w/ ENIG plating for style points - RISC-V CH32X035 MCU for the lolz - Piezo buzzer for sound - CR2016 coin cell battery - MIDI over USB-C
Reward eng should be the last resort in RL. Curriculum + simple reward
startup data point: founder shows me a customer’s support queue at 9:18pm. 43 tickets. 12 refund edge cases. 3 policy exceptions. 1 angry enterprise account. then asks: “which of these still needs a human?” that is a better pitch than 9
want to point out a few really interesting things here 1. Claude Code is actually the worst performing harness when using the same model, significantly behind opencode and cursor cli this is the core reason i've been against the LLM compa
totally forgot the team built 'sentry local' which takes our tech from Spotlight and bundles it into the CLI, giving you (aka your robot buddies) access to Sentry telemetry (hello traces)
Icon Museum now features a Wall of Icons, so you can explore the full collection on one endless canvas
Your favorite style of putter might be costing you strokes. We tested 79 putters. The top 15 were all zero-torque. Here’s what the data showed
I tried this so you don’t have to. At the end, I got: - 10,000 impressions - $600 spent - 0-400 clicks (tracking isn’t very good) - 0 conversions I’m probably not going to spend more on this platform at the current stage because it’s a ve
it brings me no joy to report I spend a year wondering why I was constantly sleepy and had a low sleep score on my whoop that was totally cured by simply stop wearing the whoop.
Very cool paper on the "hacker-fixer loop" by @fjzzq2002 et al. A 3-agent LLM system that automatically hardens benchmark verifiers against reward hacking: 1. Hacker tries to pass the verifier without solving the task. 2. Fixer patche
the most interesting thing here IMO is that this involves On-Policy Self Distillation, but the distillation gradient never touches the generator. this airgaps us against OPSD's biggest weakness — bias introduced by having non-causal privi
SCOOP: Meta plans to clamp down on skyrocketing AI costs inside the company by imposing limits on employees’ token usage, the company told staff in a memo on Tuesday, just weeks after it pushed them to adopt AI tools in their work.
We’ve backfilled FrontierMath: Tiers 1–4 (v2) scores for a selection of notable models, including recent Claude Opus models. You can find these on our website. We will add scores for Claude Fable 5 and GPT Pro models shortly.
just launched scanner by endera -- paste a soundcloud/mixcloud link or upload audio, ACRCloud detects the full tracklist, then each track gets enriched with BPM/key via http:// everysong.site and export as txt file for set planning. http
Already 90m deposits in Coinbase high yield earn product on @base (30m from CB users) in 1 day. Nice collab between @coinbase , @SteakhouseFi & @ethena . The DeFi mullet keeps growing.
we all had the same realization: fable is expensive, but great at orchestration. you get 95% of the power at 30% of the cost by letting it orchestrate other models this is why i built omegacode: your agent writes a script that can orche
The Renaissance of Sparse Attention (old dilated like Longformer/Longnet, compressed like DeepSeek, query-aware like MiniMax) vs. Hot linear attention/recurrence: Two separate lines of long-context scaling. We have a series of works with @
prediction: agents will expose a funny lie in enterprise software. half the product surface was not there because users loved it. it was there because humans needed reminders, approvals, queues, status pages, nudges, and meetings to move
How I use Claude Code and Remotion to make animated diagrams. Sorry, it's not a single prompt. 1. Find an input language the model knows well. For example, Mermaid for flowcharts. Claude writes it fluently, so it's my entry point. 2. Use
If you view a typical IPO pop as 20%, then the "bar" for SpaceX is $162 -- which is the current indicated open (although it's been falling)
AI Native companies scale by productizing everything - but for themselves. The customer, though, always buys a service.
Mat got new Update, update your app Basic fixes + A Big one: collective mats! Invite up to 5 friends over iMessage and decorate one mat together, with live sync and widget updates. Plus: - cutout-letter alphabet everywhere - type words i
Cool work on refining coarse VLM actions using a flow matching policy π(a₀ | o) → a₁ where a₀ ∼ N(0, 1) by first reversing (inverting) the given coarse action a₁ via â₀ = π⁻¹(a₁ | o) and then reconstructing it in the forward direction i.e
Raising Fund I/II? Be careful with the big consultants/asset managers. If they do a small bite ($5-20M) but demand multiples in coinvest, they are screwing you and have no plans to be a real anchor. These LPs can write $50-500M. They are