Biohub releases ESMFold2, ESMC, and ESM Atlas
Open protein models and billion-scale folded-protein atlases give biologists shared infrastructure for structure prediction, design, and discovery
Skipped many near-duplicate AI agent launches to keep security, biology, hardware, markets, web engineering, and weird artifacts represented.
Open protein models and billion-scale folded-protein atlases give biologists shared infrastructure for structure prediction, design, and discovery
Thousands of credentialless MCP endpoints show agent tooling is being deployed faster than it is being secured
A small OCR model reached strong document, table, handwriting, math, and layout performance while remaining practical to run locally
Editor’s note: imported_from_x_likes
Five AI-assisted papers across algebraic geometry, representation theory, number theory, and combinatorics passed peer review in solid math journals
A large parallel agent scan on public models found and fixed several high-severity vulnerabilities in a real codebase
Modos Flow shipped as a 60Hz touch e-ink monitor with complete MCU firmware, FPGA code, and KiCAD board files
A sandbox escape affecting Firefox and Firefox Focus for Android reached public disclosure after Mozilla fixed it in Firefox 151
A years-long Parquet effort finally pinned down how floating points and NaNs should behave across implementations
Instanced rendering, faster selected-segment drawing, hit-testing changes, and selection-color gathering produced major speedups in a core design tool
Glassworm targeted developers through poisoned VS Code extensions, npm packages, and GitHub repos while using Solana, BitTorrent, and Google Calendar for resilient command and control
Protein ML now has a FineWeb-like cleaned dataset bundle covering sequence, structure, and related modalities instead of scattered supplementary tables and FTP mirrors
A neutral routing layer reached massive token volume and venture scale while the market debates whether model labs will own the whole stack
ByteDance is considering an AI infrastructure buildout financed by tens of billions in annual profit, putting private capex on hyperscaler scale
BYD’s rise relied partly on billions owed to suppliers as cheap financing, and regulators are now forcing a cleanup of that balance-sheet strategy
A mainstream brokerage is allowing tools like Claude and Cursor to connect to segregated accounts and place autonomous stock trades
Chrome’s handling of wheel listeners around number inputs can mutate values unexpectedly, with the fix not landing until Chrome 150
Nearly one million authentic-standard MS/MS spectra are now available inside a metabolomics analysis platform as experimental data rather than predictions or crowdsourcing
A field robot adapted to real customer sites using short onsite data collection, attacking the last-mile gap between robotics demos and deployment
A real-time detection and segmentation model family now ships through Transformers with fine-tuning tutorials and webcam demos
An open-source architecture tackles the k-means bottleneck in late-interaction retrieval with up to 247x faster clustering and 9.8x faster retrieval
A chemical-data splitter that once required multi-terabyte-memory machines can now run robustly on consumer hardware
A practical ARM64 walkthrough shows how an information leak and a second bug combine into code execution even though either bug alone is harmless
After memory chips and CPUs, AI demand is now disrupting optical communications components that data centers need to scale
A development contest asks entrants to fit complete games into the size of a single floppy disk while still allowing modern engines if they fit
Samsung common shares trade at a 63% premium to economically similar preferred shares, implying a large valuation gap inside the same company
Researchers replaced gravel with forest soil and grass at daycare yards and saw measurable immune-related changes in children within a month
Quantum-tech advances are enabling optical interferometry schemes that could synthesize telescope apertures far larger than any single mirror
Two old TEM control panels were reverse-engineered into a working game controller, preserving obscure instrument muscle memory in software
Installing Lean through VS Code and creating a starter mathlib project produced a seven-gigabyte directory, revealing the cost of modern formal-math tooling
A great-power conflict could exhaust missile stocks quickly, and simply founding more missile startups does not solve the brittle solid-rocket-motor supply chain
Found a local privilege escalation on the latest Linux. Reported to the vendor, awaiting CVE. Writeup after the fix lands.
auto: [Android 5.15] KASAN: use-after-free Read in atime_needs_update: Detected use-after-free in atime_needs_update function, leading to a read of size 4 in fs/inode.c. Issue found in task syz.1.1270 during unlink syscall. link: https:/
We evaluated Gandalf, our agentic judge, on a new meta-evaluation dataset called BankerVerifierBench (BVB), built on top of BankerToolBench (BTB), a long-time-horizon investment-banking benchmark. Gandalf achieves the highest performance an
Built a speculative decoding inference engine in Triton. (more tests still ongoing) For now, I've tested with the GPT family because thats what i can run on my personal GPU (4gb) and I'm outperforming SGLang on both throughput and correctn
The unlock was a self-improvement loop. We record production misses: unsupported fields, wrong predictions, and corrections. Codex then uses that context to autonomously create evals from production data, hillclimb against them, and open
A single Meta engineer burned roughly $500K/month in Token consumption (about 300 billion tokens / month) on the company's internal "Claudeonomics" leaderboard that ranked employees by Token usage. The leaderboard ran from March, employee
with the Lightcone API you can run 50 fast computer use agents in parallel, each on its own machine. model and infra via one API. we built a demo that summarizes news and maps it.
The speed-of-light optimization for Qwen3.5 on the TokenSpeed inference engine is a significant milestone, achieving a record-breaking 580 tokens per second (tps) for agentic workloads on NVIDIA GPUs. In the PyTorch Foundation's latest com
There are so many sandbox providers out there. Why is there no first-class integration into Claude Code, Codex, Opencode, etc? I just want to spin up 10 agents with the full ability to run my local code independently and communicate with
Why does deep learning generalize? What does weight decay really do? Can algorithmic information theory address these questions? In my latest preprint, I give a proof that the minimum neural weight norm matches the minimum program length (
Laguna M.1/XS.2 tech report from Poolside has lots of details on their infrastructure!
slime was built for agentic RL from day 0. We added an Agentic RL Training Roadmap that brings together the pieces already in slime for agent workflows: custom generation, verifier/test-based rewards, fan-out samples, async rollout, SGLang
So awesome, Liang Sheng! I burned through 460 million tokens in a single day, and it only cost me about 30 RMB If this were Opus or something, it might easily run up 700 bucks or more, who knows, but the actual performance here is pretty
MiniMax-M2 paper just dropped The key focus of M2 is on something more agent-native. It trains on runnable workspaces and artifact-grounded rewards, then uses Forge to scale RL over long coding, app, search, and office-task trajectories.
A dangerous Windows Kernel EoP vulnerability allows browser sandbox escapes. Public PoC exploit code is available on GitHub. #Windows11 #Infosec #KernelExploit #CVE202640369 https:// securityonline.info/windows-kernel -eop-vulnerability-p
Things that DeepSWE does well on (for long horizon benchs out there): 0.3% false-positive vs SWE-Bench Pro's 8.5%, with an independent LLM-analyzer audit on every trial pretty good contamination resistance as seen from canary GUID, fairly
1/ Today we're releasing AttuneBench, the first open EQ benchmark grounded in real multi-turn human-model conversations, scored against what the person actually felt and wanted at each turn. Built by the research team at @pareto_ai in co
Grading agent rollouts in rubric-graded RL environments is itself a hard task. Prior approaches pass serialized artifacts or agent trajectories to an LLM judge; this loses information / doesn't support sophisticated criteria. In contrast,
DwarfStar prefill is a hell faster after merging https:// github.com/antirez/ds4/pu ll/264 …, I need to update the README benchmarks as they no longer mean most :D after +40% prefill speed boost.
SGLang v0.5.12.post1 is live This is a stability patch on top of v0.5.12, with 12 cherry-picks focused on DeepSeek V4, NIXL PD disaggregation, and Blackwell. DeepSeek V4 Fixed V4-Pro garbled text on single-token decode (B200/B300) Fixed
EAGLE 3.1 is out. The team identified attention drift as the root cause of acceptance-length degradation at deeper speculation steps. Fix: FC normalization + post-norm hidden-state feedback. Result: 2x longer acceptance length in long-cont
shipped @getbuzzr /dfs-engine 4.0 today — strict settlement contracts, typed invariant errors, hardened payout math for PrizePicks/Underdog-style DFS grading. 4 new companions also at 1.0: dfs-cli, dfs-react, dfs-provider-sportradar, dfs
Editor’s note: imported_from_x_likes
rtk: a Rust CLI proxy that cuts Claude Code token usage by 60–90%. It filters output from git, tests, lint, kubectl etc. before it hits the LLM context. A pre-bash hook reroutes commands like git status through rtk to strip redundant info.
Editor’s note: imported_from_x_likes
we just shipped sandbox-sdk v0.10.2 today - cloudflare tunnels support - mount R2 buckets directly from worker bindings - isolated exec() calls small release, but a lot of quality of life improvements for agents running in containers
New paper on activation mixing. The authors evaluate several mixing strategies across both classical FFNs and SwiGLU FFNs, with ablations on dense and MoE models. One interesting result: the most expressive mixing strategy isn’t the best ch
Can current code agents survive beyond single-repo bug fixing? BeyondSWE: 500 real-world tasks from 246 GitHub repos, covering cross-repo issues, domain-specific fixes, dependency migration, and doc-to-repo generation. https:// arxiv.or
ECHO paper + code are now live! We open-sourced a small SkyRL-based implementation of "world loss" for terminal-agent RL. GRPO trains on what the agent did. ECHO also learns from what the terminal said next. Same rollout. Same policy fo
Gandalf code: https:// github.com/Handshake-AI-R esearch/gandalf-the-grader … Blog post with details: https:// joinhandshake.com/research/ai/ga ndalf-the-grader/ …
New: grep for exact matching grep → keyword / regex matching search → fine-grained semantic retrieval Works across uploaded content, including text, PDFs (OCR) and audio/video (transcription). Give your agents both retrieval primitives t
Tasks require agents to investigate Kubernetes incident snapshots through shell commands and submit a structured JSON diagnosis identifying the responsible root-cause entities. In one public SRE task, the agent sees user-facing failures in
The bottleneck in LLM inference isn't compute. It's how fast you can move the weights. Our CTO Mathias Lechner, @mlech26l , joins Piotr Mazurek, @tugot17 , from our inference team, to discuss what actually limits token throughput and how
The MiniMax M2 series was one of the most widely used open-weight LLM series earlier this year. Now, we got a technical report with some interesting tidbits. I summarized some of them below: 1. Full attention as an anti-trend?: They tried
Most AI products still reset after deployment. Trajectory is building something more interesting: AI systems that continuously learn from real usage. Every correction, retry, and edit becomes training signal instead of wasted data. They’
Early on, Tax AI handled simpler returns. By season’s end, it processed K-1s, rentals, LLCs, deductions, and more. At launch, ~25% of returns hit 75%+ field completion. Six weeks later: 86%. Now it drafts returns with up to 97% accuracy,
Behind the build of self-improving tax agents with Codex We co-built Tax AI with @ThriveHoldings around tax prep workflows so when reviewers fix any errors, Codex can trace the failure, improve the system, and test the change before it
Introducing a minimal training harness built on prime-rl and verifiers, so you can now train your own RLMs without sandboxes! All available in the `training/` folder in the RLM GitHub repo! We train RLM-Qwen3-30B-A3B-v0.1, using RL on a se
Yesterday I received an email to notify me of a case that looked like a malicious Google sponsored ad result. I tried to make sense of it, unraveling some obfuscated JavaScript, then stages of Batch and PowerShell (with some interesting cod
Per @EpochAIResearch , the world's Blackwell GPUs can produce roughly 500M–20B output tokens per second today, depending on context length. Inference capacity is growing 3.4x/year. Token demand is growing 10x/year. Long-context workloads
Everything you always wanted to know about Transformers.js, in one video. I made a deep dive into how AI models run from JavaScript: tensors, ONNX, quantization, `pipeline()`, WebGPU/WASM, preprocessing, postprocessing, and what happens un
Someone debugged for half a day, only to find their RL was forever stuck at (EntropyTaskRunner pid=x) self.use_critic = need_critic(self.config) Turns out this pig very thoughtfully reused the same submit_task.sh, allocating a full 16
And poof! just like that. all that obscurity to hide feature extraction/heuristic logic/verdict weights means fuck all now, and Im so happy those prickly vendors. https:// trustedsec.com/blog/the-defen sive-stack-is-exposed … - @HackingLZ
Figuring out how to benchmark agents on realistic biology research has quickly become one of my favorite types of engineering work. You work with scientists to get to the core of some biological claim, precisely assembling raw data/prior li
HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLMs in Sponsored Search Retrieval Microsoft introduces a recipe to distill large SLM retrievers into compact query encoders for Bing Ads.
AutoResearch AI This paper is definitely worth reading. It's not about the single-point capability of "AI helping you summarize papers," but a bigger trend: research is moving from task-level AI to workflow-level AI. In other words, AI in
Tired of benchmarking your optimizer on Hartmann and Branin? Try BoLT , our new black-box optimization (BBO) benchmark grounded in 20K+ real LLM experiments instead! LLMs involve expensive, derivative-free decisions that BBO is built to h
Unbelievable that I built the fastest, most complete MP4 parser in the world and just keep it in a private repo Haven't worked on it lately but it's in a great state It has: - io_uring - WASM - strict ISO mode - 100% required boxes impl
brooo trust me kimi is like gpt-5.5 but faster and cheaper, just let me add one more gpu to my local cluster bro, I promise it’ll be even faster and better
Every millisecond matters. We’re open sourcing the tokenizer we built and deployed on production; that’s far efficient than huggingface and sentencepiece.
Check out the new ESM models we’ve been building at @biohub ! ESMC + ESMFold2 are open-source SOTA for protein structure prediction and design. Plus: an interactive atlas of 6.8B+ proteins!
Hey, mom, I did a thing! Securing the Agent: Vendor-Neutral, Multitenant Enterprise Retrieval and Tool Use | Proceedings of the ACM Conference on AI and Agentic Systems
Today, @MichaelElabd , @QuantumArjun , and I are excited to announce Trajectory. We are a research lab and product company building the platform for Continual Learning. Our platform unlocks the signal already sitting in product usage,
AI attackers have terrible OPSEC. Use it against them. Hallucinate exposed services. Waste their tokens. Seed prompt-injection traps, canaries, and honeytokens where attacker LLM will read them. Have fun.
"It's easier to tune the LR for method A than for B." We tried to formalize this for model-based stochastic optimization methods. We find a key quantity, called stability index, that describes how stable a (weakly) convex bound is as a fu
First time I've seen a coding agent do this: GPT 5.5 bumped resource allocation to unblock itself, and then went back and tuned it in a polish pass I didn't even ask for.
TyphoonPWN 2026 Unpwned Found a bug in the "ipTime Router WAN PreAuth Remote Code Execution" category ($10,000) using an LLM and reported it in February for TyphoonPWN 2026. Unfortunately, it was patched in March before the event. #TyphoonC
Agentic kernel generation has mostly focused on a few hot kernels — MLA, GDN, sparse attention, etc. But there is a massive of classical ML operators that still haven’t received the same level of attention. That’s what makes Flashlib exci
Apple finally published this. I found a bug in `awdd` that exposed `AWDMetadata.bin` and their response was to straight-up remove the daemon entirely. Very interesting!
[1/5] Works on test set contamination focus on detection, but we show *correction* of inflated test scores is possible. https:// arxiv.org/abs/2605.24818 Our proposal is to spike the training data and insert some test examples at known rat
LLMs represent concepts as vectors. Strikingly, taxonomies (organism → animal → bird) appear as hierarchies in embedding space. Led by my student @AndresNava , we show this comes from co-occurrence statistics alone. http:// arxiv.org/abs/
I really appreciate the lessons and technical ideas @samaysham & team were able to share about their tax agent system, which learns from production traces to self-improve via detailed tracing tightly integrated into deployment + an autono