Using time-travel debugging and an AI agent on a 7B-instruction Android trace
An AI-assisted debugger traced a noisy ARM64 Android execution path through MTProto v2 decryption to AES-IGE in about ten minutes
Top 90 curated tweets ranked for substance on 26 May 2026 UTC.
An AI-assisted debugger traced a noisy ARM64 Android execution path through MTProto v2 decryption to AES-IGE in about ten minutes
2FA-gated publishing and install controls move registries from passive package stores toward active defenses against compromised maintainers and malicious releases
A sequence-to-function model trained on more than 22M single cells predicts gene expression in specific cell types and disease states directly from DNA sequence
Robot learning datasets stress storage systems differently from analytics workloads, forcing tradeoffs among Parquet, MCAP, Lance, NCore, and purpose-built formats
Across frontier models, asking clarifying questions turned apparently robust execution settings into prompt-injection attack success rates above 30% for several systems
Modern GPU kernels now span multiple scheduling regimes, raising fresh questions about how kernel DSLs should expose asynchronous execution
Annapurna Labs now designs AWS chips including Graviton, Trainium, and Nitro, which Amazon says exceed $20B in run-rate revenue
PerturbSpace aims to combine spatially resolved multimodal readouts with whole-transcriptome CRISPR screens without leaving standard single-cell workflows
A detailed Windows ARM64 interrupt-handling deep dive fills in low-level mechanics that are rarely documented for researchers and exploit developers
Quarterhorse Mk 2.1 became a privately developed unmanned supersonic jet, marking rapid progress from founding to Mach 1+ flight
In a peer-to-peer parlay market, a tiny bet with a huge payout can force a market maker to lock the entire payout as collateral until resolution
Long-lived GitHub CLI tokens stored on developer machines can be stolen by malicious scripts and used to escalate supply-chain incidents
A small sample’s singleton and doubleton counts can estimate the total number of unique items in a large dataset with a simple formula
Meta’s report describes multi-datacenter training techniques including a pipeline-parallel schedule designed to work with ZeRO-2/3-style optimization
Eli Lilly presented data on a single-infusion base-editing therapy targeting cholesterol biology, pointing toward durable cardiovascular-risk reduction
A lightweight tactile glove with 800Hz IMU data, 526 pressure points, and sub-2mm motion accuracy targets teleoperation and imitation-learning data collection
Eight proposed methods for detecting unfaithful chains of thought mostly failed when tested against ground-truth faithfulness labels
A small amount of distracting information in a long context can cause a discontinuous performance drop rather than a smooth degradation
New research estimates the wealth and effective tax rates of California’s 200 billionaires, including founders behind Meta and Alphabet
Quality-adjusted AI output can expand extremely quickly while remaining nearly invisible in standard GDP statistics, creating a policy measurement gap
Covered interest parity describes a clean pricing relationship, but real FX stress often comes from whether dollars can actually move through collateral and settlement plumbing
A tiny wrapper around capnweb lets a client pass a fetch function to a server so the server can fetch back into the client
A researcher published a RHEL zero-day originally prepared for Pwn2Own Berlin, reopening the question of how much SELinux containment matters in 2026
The Mathlib Initiative is launching a project to make AI-driven autoformalization genuinely useful for researchers while keeping the work open source
TritonMoE implements the full MoE forward dispatch path with portable OpenAI Triton primitives instead of relying on custom vendor-specific kernels
A registered Waymo I-PACE weighs roughly 1,100 pounds more than the stock vehicle, implying sensors and compute equivalent to several passengers
If Starship launch costs fall enough, orbital data centers could become cost-competitive with today’s terrestrial data centers within a few years, though not a major compute source before 2030
Holocron implements Mintlify-style docs as a Vite plugin that can be self-hosted on Vercel, Cloudflare, Docker, or other targets
Parse 2.0 targets forms, tables, handwriting, and scans by converting messy PDFs into markdown that downstream agents can act on
We want to move the LP closer to the ILP. We find some cut (constraint) that is violated by the LP solution but wouldn't be violated by any integer solution. We then solve the new LP and iterate. Our LP lower bound increases, and the rounde
@BowenWangNLP et al. dropped 32,122 verifiable rlvr tasks for training cua agents which is about 87x of osworld tasks. large enough to experiment some cua rl scaling
If you can’t eval a thing easily it’s a product smell What you need from an analytics AI is proof that the number can be trusted. This means verifying quantities used against trusted reports, dashboards, prior analysis, etc to ensure n
1/ We spent the last few days integrating Centaur https:// github.com/paradigmxyz/ce ntaur … into Pareto Credit as an internal AI teammate. This is one of the first AI infra projects that actually made me think: “ok, this is how company a
AlphaProof Nexus advancing research math, solving 9 Erdős problems & more! Amazing experience to be part of this team & project. Excited for AI-driven formal proof search becoming a collaborator in math discovery, one that deepens human und
LongLive 2.0 gets another speed boost! We further optimized the NVFP4 inference path, improving overall throughput by 18.6%. A 64s video now takes just 30.6s end-to-end, including VAE decoding. That’s over 2x real-time generation. Hi
Your CFO when you spend $300M on Claude because you have no routing logic
Hα Sun time-lapse, the first of hopefully many. 1 hour of data acquisition (modest 30K frames, 130 GB raw), 8+ hours of processing (includes a few false starts). Full-size video, workflow details, etc: https:// app.astrobin.com/i/59uc7v
The setup is this: we have some integer linear program that represents the optimal tokenizer problem. We relax it to a continuous linear program so we can solve it fast. The solution will typically have fractional values, so we don't direct
From IcePop to KPop — our team keeps pushing on RL training stability for large MoE models. KPop replaces the fixed-ratio mask with an adaptive binary-KL region that matches each token's inherent noise. More robust updates, stable long-ho
Editor’s note: imported_from_x_likes
RLVR has become the recipe for agentic post-training. But for Computer-Use Agents, the bottleneck is not the algorithm, it is the data. We introduce CUA-Gym: a scalable, lightweight synthesis engine that turns arbitrary task queries into
On-policy Distillation (OPD) can suffer from mode-seeking behavior due to the reverse KL objective. In our recent work, we address this by augmenting OPD with a forward KL term. Please check out @wg_jin02 's post for more details!
very awesome resource from hugging face with available slides about how they generated 1T synthetic data a really cool sneak peek at what we feed foundation models
I was nerdsniped over the weekend by this paper. I tried extending it by using various cutting plane strategies to train a provably optimal tokenizer. I made some progress, but it's still quite far from solved.
Started a new personal blog for shorter/informal posts to share ideas with folks! My first post is about capacity in associative memory, why recall is not a sufficient statistic, and loose desiderata for end-to-end learnable AMs.
new in-depth blog post time: Inside the Transformer: The Life of a Token a deep dive into a modern dense transformer, i cover YaRN (why does pairwise coordinate rotation induce positional information?), hybrid attention (getting to 160k c
I spent a year of my PhD stuck on a 2002 problem of Schechtman. GPT 5.5-Pro helped me finish: vector balancing for zonotopes (shadows of a cube)! For any zonotope Z ⊂ ℝᵈ, v₁,...,vₙ ∈ Z, there are signs x₁,...,xₙ ∈ {-1, 1} with x₁v₁+...+xₙv
What if the very pretrained prior that lets an RL agent explore tools also destroys the format that made it tool-native? We name this the Tool Prior Paradox — and tame it with PARA-GRPO. Introducing ParaVT: parallel video tool use × agen
The era of "AI forgingAI" is officially here! Introducing ForgeTrain — the world’s first fully AI‑generated production‑level pre‑training framework. No human in the loop. This is not an experimental prototype, but a true "AI engine" with
GitHub - 7h30th3r0n3/CVE-2026-9082-Drupal-PoC: Drupal Core PostgreSQL SQL Injection PoC - CVE-2026-9082. Ethical PoC for the Drupal vulnerability allowing anonymous SQL injection through the JSON:API module on PostgreSQL-backed sites. · Git
If acquiring a resource fails, then it's an error condition that should be returned to the caller. If releasing a successfully acquired resource fails, then it's a bug that should cause an assertion to be triggered.
Introducing Preprint what if browser use could be just text? a research experiment which exposes web pages as text files to LLMs - Which they can edit to make actions, type, tap, etc. https:// github.com/supermemoryai/ preprint …
[ICML' 26] From Pixels to Tokens: A Systematic Study of Latent Action Supervision for Vision-Language-Action Models https:// github.com/RUCKBReasoning /From_Pixels_to_Tokens …
Slate powering (part of) an LLM KV cache!
We built ASPI to isolate clarification-seeking as its own agent state. Each benchmark scenario compares: - Execution mode → the agent receives a fully specified task - Clarification mode → the agent must ask follow-up questions before acti
Are we nearing a compute crunch? In our latest Gradient Update, @luke__emberson and @Jsevillamol estimate how many tokens all the Blackwell chips on Earth could serve, and compare this to total token demand. Direct comparisons are diff
What is the role of text tokens in diffusion? Do they carry anything beyond the text prompt? We study this in FLUX.2 @bfl_ml for the task of reference-guided generation, and found that text tokens hold visual information from the referenc
why does agent infra (DBs, Sandboxes, Workflow Engines) need <100ms latency when one call to a thinking model takes dozens of seconds?
Not to degrade from this work, but TurboQuant is not a competitive method nor a good benchmark. Researcher -- including me -- cannot replicate the TurboQuant paper, and even then, the performance is not great. Please. Just. Stop.
This was a very fun project. In Behavior-Consistent Deep RL, we provide a method that aligns the behavior of independently trained policies. It turns out, this works even in high dimensional spaces. Here are 6 seeds of Humanoids (all ca sam
Editor’s note: imported_from_x_likes
RL environment startups might be cooked with this one Incredible work by Bowen and XLANG Lab! Scaling data through end to end synthetic pipelines: Tasks, environment, and verifier all created autonomously through coding agent. Also supe
Today, we’re sharing a new state of the art for computer use. Our system holds the two highest verified scores on OSWorld, the standard benchmark for AI agents that operate a computer like a person: 83.6% using Claude Opus 4.7 and 81.5% u
We just shipped a crazy update to Sentinel- we doubled the quality of video without affecting latency. This is teleoperation from ~2k miles away. Scaling teleop is now possible @AveaRobotics
Introducing EAGLE 3.1 — a major step forward in speculative decoding robustness, efficiency, and deployability. From the EAGLE team @hongyangzh , in collaboration with vLLM @vllm_project and TorchSpec teams. > FC norm + post-norm archit
A little over 2 years ago, I solved the SolidGoldMagikarp stability problem. Today, I am releasing the results of that work as a new technique to regularize training. More details below.
Your logging pipeline is a security control. If it can be tampered with, turned off, or overwhelmed by an attacker, your detection capability has a kill switch.
Your Embedding Model is SMARTer Than You Think! Single-vector models actually hide powerful multi-vector capabilities in their frozen hidden states. We introduce SMART, a framework that unlocks this ability for SoTA multimodal retrieval.
Vibe Coding A Human Designer App With ThreeJS: Exports and Trade-Offs We can now fully export a designed human with custom skin, deformations, hairs and clothing as glb!! However, glb knows nothing about my super nice hair shader so i wor
Today we’re releasing 1-bit and Ternary Bonsai Image 4B. A new family of image-generation models designed to run high-quality diffusion inference on local hardware: from laptops to phones.
Agentic tasks are the biggest story. There was a meaningful increase on Vibe Code Bench, +22pp from its predecessor Qwen 3.6 Plus. We also saw increases on Finance Agent v2 +8pp and Terminal Bench 2 +14pp. These are large gains across the b
Building a Speculative Decoding Inference speculative decoding (sds) is when a small "draft" model predicts multiple tokens fast, then a big "target" model verifies them all at once. if done right, you get ~2x faster generation without any
Someone on social media was bragging they got a CSAM website taken offline. They illustrated this by showing a CloudFlare report. The report shows the domain this person reported. CloudFlare clearly states it is being investigated, forward
Introducing MathCode 0.2.0: maximize prompt-cache hit rates and reduce API costs by up to 90%. Project Page: https:// github.com/math-ai-org/ma thcode …
This is a killer stack I just started using Wafer to serve my qwen3.6-27b custom fine tuned llm and it's excellent
new minimax sparse attention compared to deepseek v3.2 (DSA) and v4 (CSA) main changes: - based on GQA not MLA - block level selection like in CSA but attention is done on the real KV, not in the compressed dimension
We evaluated CoT faithfulness evaluations & released 𝐁𝐨𝐧𝐚𝐅𝐢𝐝𝐞 so you can test yours too!!
Autoregressive transformers have a core problem that limits their decoding performance: teacher forcing. This technique has been around for a while, and has let us train them massively in parallel. But it has a significant inference gap tha
Our supply estimate is based on serving Kimi K2.6, a trillion-parameter model with 32B active parameters. Using 8k:1k input-to-output token requests, we estimate it would be possible to serve ~20B output tok/s, enough to serve every person
We're releasing early results from training Kos-1 Experimental, a Kimi K2.5 checkpoint post-trained on the same medical RL data we used for Kos-1 Lite. As clinical workloads become more agentic, we wanted a model that pairs medical domain
KnowledgeDeliver flaw exploited as a zero-day to install web shells https:// bleepingcomputer.com/news/security/ knowledgedeliver-flaw-exploited-as-a-zero-day-to-install-web-shells/ …
People are building interactive agents (in addition to background agents). The time-to-interactive (TTI) metric measures how quickly users see something from the agent. Users often just see a spinner while a sandbox spins up and installs t
Building on @nilinabra 's Soft Muon idea, I found a set of polynomials you can use to compute UΣᵖVᵀ accurately for |p| < 0.9 as efficiently as Newton-Schulz/Polar Express. check it out!
i got excited when i saw @Nick_Prince12 post so i asked my agent something similar.... a US economy snapshot report based on @michaeljburry substack vs status of live US market stats & where things stand now. i let my agent use followi
Our on-device TTS model Phonon (100M params) now reaches 1.00% WER on the Seed-TTS English benchmark. Smaller than every model it already beats.
The latent-vs-pixel debate misses the point. GPT Image 2 shows what users notice: pixel-level fidelity. Latent models show what scales: compact semantic structure. We connect them by replacing VAE/RAE decoders with a Pixel Diffusion Decod
Curious about the secret sauce behind our trillion-scale agentic foundation model? Here it comes! Last year, we released IcePop to stabilize MoE RL with double-sided masking. As we dive deeper, something unexpected happened: the masking r
Over 1 billion PDFs are created every day, but your agents still can’t read them reliably. Today we’re releasing Parse 2.0, the most accurate document parsing API in the world. Extend already processes millions of pages daily for leading
You can now run GPT, Claude & other models in Unsloth. Connect + run APIs in a local UI: - Code execution, web search, image gen, editing - Auto prompt caching to save costs - Provider features like cites, sandboxes GitHub: https:// gith
"Don't trust. Evaluate." @nearestnabors set out to replace Claude Sonnet with Gemma 4. The evals showed a quantifiably better option. Full walkthrough: capability evals + prompt engineering to ship a local 3B that matches Sonnet, 2x fas
With CodeAgent, I can finally pick up so many things I’d dropped due to low energy. Blogging is one of them. This blog is ~1% me, 99% the agent https:// victorchen96.github.io/auto_research_ survey.pdf … (Disclaimer: Just doing this f
splat-transform's offline rasteriser now supports depth of field. Each Gaussian dilates by its own circle of confusion in the projection pass. New flags: --f-stop, --focus-distance, --sensor-size. This test was rendered with a simulated