Trivy scanner credentials compromised in npm supply-chain worm
A scanner trusted in CI/CD became a worm delivery path across 60+ packages within 24 hours, showing how defensive tooling can become a privileged supply-chain target
Top 90 curated tweets ranked for substance on 02 Jun 2026 UTC.
A scanner trusted in CI/CD became a worm delivery path across 60+ packages within 24 hours, showing how defensive tooling can become a privileged supply-chain target
Turbopuffer added instant copy-on-write namespace clones with 440ms p50 branch creation, making database-scale branching feel like a filesystem primitive
A 32Gb HBM4E die appears to be around 145–150mm², implying each memory generation consumes more wafer capacity and keeps supply tight through the decade
A $1.5B run rate at 35–40% gross margins reframes Mercor from a staffing marketplace into one of the fastest-growing AI data businesses
A remotely triggerable H.323 connection-tracking parser bug in the Linux kernel can read out of bounds from unauthenticated packets, keeping legacy protocol parsers on the front line
A chaos test with network delay made a node panic during snapshot catch-up and stall two Raft groups, with an upstream fix shipped the same day
An eight-month-old with WWOX epilepsy became seizure-free after an intrathecal AAV delivered a working gene copy, extending n-of-1 medicine beyond ASOs
Citadel’s program would formalize a market for external hedge-fund alpha that feeds back into its own quantitative strategies
A 32K-trajectory study across model sizes, retrievers, and benchmarks claims context compression decisions in deep research agents can be predicted rather than tuned by heuristics
Training vision encoders to see dynamics instead of static frames improves real-world robotics out-of-distribution success by 22.5%
Cerebras’ wafer SRAM jumped 2.2x from WSE-1 to WSE-2 but only 10% on WSE-3, illustrating how SRAM scaling is becoming a hard limit for wafer-scale AI chips
A notebook workflow can design PD-1 minibinders and scale a 96-well plate for roughly $1,000 with success rates above 50% on typical targets
An automatic transmission valve body selects gears from oil-pressure signals without electronics, running a mechanical control system on fluid logic
A 12K-kernel benchmark suggests LLMs can act as selective runtime surrogates during kernel search and defer to real GPUs when uncertain
An open MuJoCo dexterous-hand environment with cube reorientation, sim-to-real tooling, and reproducible setup lowers the barrier to hand-manipulation research
Implementations encode years of edge cases, operational lessons, failed experiments, and invariants, which is why clean rewrites often discard the real system they are trying to replace
Europe’s sovereign digital identity wallet depending on Apple or Google accounts exposes a gap between digital sovereignty goals and mobile platform reality
A malware campaign hid command-and-control instructions in Steam profile comments using invisible Unicode, abusing trusted gaming infrastructure for WordPress infections
Diagonal forget gates offer a RoPE-free attention mechanism that beats RoPE and Forgetting Attention on length generalization after 4K-sequence training
Across 540+ token launches since 2020, the average token spent 70% of its life below launch price, quantifying how low-float high-FDV launches transfer value away from later buyers
The NeurIPS position paper track found heavy AI use despite a human-written requirement, creating a test case for academic review norms under ubiquitous writing tools
Solar panels are no longer the main cost bottleneck; deployment, storage, transmission, grid flexibility, permitting, and AI-driven demand now dominate the problem
A waiver to accelerate Three Mile Island’s restart for Microsoft data centers ties AI infrastructure growth directly to nuclear regulatory decisions
Cross-node prefix-cache reuse in vLLM via Mooncake Store makes agent rollouts cheaper by sharing reused context across distributed training nodes
The field-sizing: content property lets textareas grow naturally without JavaScript autosize hacks
Tracking neuronal proteins from creation to clearance shows waste drains to nearby immune niches and that Alzheimer’s disease blocks the route
0xPPL shut down after betting on a long tail of crypto apps that consolidated instead into a few super-app categories
China’s 1,300-ton undersea data center off Hainan aims to cut cooling costs by using seawater, raising new questions about data-center siting and physical risk
JEE Advanced 2026 candidate/result infrastructure ( https:// cdata.jeeadv.ac.in/result2026/) had a public cloud storage misconfiguration exposing bulk candidate data without auth. This exposed ~179.6k result records and ~187.3k admit-card
In 2016 I worked on a 1 million container challenge while working on Nomad at Hashicorp. We spun up a million containers in 5-7 minutes and keep them running for an ~hour and show our users Nomad can run large clusters at steady state. 10
We found a debug flag enabled in 6 Microsoft Android apps that turned into a vulnerability Any app on the device could access the Microsoft account Affecting: Word, OneNote, PowerPoint, Excel, 365 Copilot, Loop. Here's the full story of
The writeup is here. We achieved RCE in Minecraft Bedrock, turning a 4-byte heap overflow into complete client compromise. @ryaagard details a universal, Bedrock-specific technique for bypassing ASLR and achieving arbitrary read / write
Finally, we can reverse proxy Cursor's own Composer 2.5 to make it available for any agent to use. https:// api-for-cursor.standardagents.ai
Can an LLM act as a selective model of a GPU during evolutionary search, by reasoning + forecasting a kernel’s runtime but deferring to a GPU when unsure? We produced 12k kernels + runtimes from evolutionary search, costing 400M reasoning t
Some of the more puzzling unpublished observations from our paper: deep attention layers hate the residual stream of V and love it for QK, but if it has to make a choice, it will satisfy V over QK. Translated to finding: if we learn coeffi
native copy-on-write branching in tpuf is an enormous ship. we already have 10M+ branches in production. you can go try it right now
Grok Voice from xAI controlling my macOS desktop! Built a custom app with 400+ tools. Handles music, browser, settings, lightning fast responses. It's able to spin up Grok Build agents and manage them. Grinding on this, still testing.
So /goal is awesome Over the past few weeks I used @PrimeIntellect to train a 149M late interaction model based on GTE-ModernColBERT-v1 using PyLate, focused on clause extraction from legal contracts. On the MLEB benchmark it does well
SDPO++ for Continual Learning Day 5 of Trajectory, we modify Self Distillation Policy Optimization for long horizon agentic tasks. SDPO is a promising route. It learns from a single trajectory, with no group required and failures still p
GRPO has a known dead-zone: when all sampled trajectories are all correct or all wrong, group-relative advantage collapses and learning stalls. On-Policy Self-Distillation tried to give dense token-level guidance but its token preferences
Cerebras did what the industry calls impossible: turned an entire 46,225mm² wafer into one chip. Defects on silicon that big are inevitable, so they built in redundancy and custom per-batch masks that route around every bad core, landing ne
user reported a bug that was happening for them but not for me i whipped up a prompt for them to run in codex a few mins later: fix confirmed, user opened PR self-healing software is here, just not evenly distributed
Verification is the hidden bottleneck for knowledge work agents, especially in legal AI — complex, long-horizon work is graded by rubrics with dozens of strict criteria. In new research with @langchain Labs, we study how to verify legal
Today i'm excited to announce storagesdk, built in partnership with @TigrisData A TypeScript SDK for object storage with snapshots and forks as first-class primitives branch a bucket per agent run. mutate safely. replay from the same ba
We now have upwards of 500,000 CPUs across our servers. I went to order a block of 100k more. Only one vendor was able to fulfill that order; no other supplier had the scale required for our timeframe. The CPU crisis isn't here yet for mo
One policy. Many robots. Many dexterous hands. XL-VLA learns a shared cross-hand latent space for VLA models across XHand, Inspire, Ability, and Paxini. It retargets trajectories across hands, scales with humanoid and tabletop robot data,
Attackers hijacked Red Hat's legitimate npm scope to push backdoored versions of 32 packages targeting cloud secrets and CI/CD tokens. The malware spread via compromised GitHub Actions OIDC tokens, affecting 9.8M downloads. #DFIR_Radar
Miasma malware compromises Red Hat npm packages in sophisticated supply chain attack, stealing credentials and spreading through CI/CD pipelines using worm-like behavior. Builds on Shai-Hulud tactics with GitHub abuse for verified malicious
Most "agent memory" benchmarks just test whether a chatbot remembers your preferences. That tells you almost nothing about real agents. So we built MemGym: memory evaluation for deep research, coding, and GUI agents, with a clean memory-iso
I have successfully evaluated OPUS on models exceeding 1 trillion parameters with strong results. However, due to some reasons, we are unable to include these findings in the current manuscript. Additionally, I have observed that OPUS demo
Excited to be presenting our work on memory + VLAs at ICRA'26 this Thursday morning (poster 224). We found that a super simple language-based scratchpad with spatial and temporal grounding goes a long way in imparting memory to VLAs. 1/n
currently at happy hour and learning the rodent company just straight up took some of our code, violated the license, still violating the license how did we know? they left links to our issue tracker in it 1-10 how aggro should i be? how
In the past months, I have advocated where I could to stop investing in obfuscation as a protection mechanism. It was good at frustrating humans but machines just don't care that much. Packing is more effective so far but I don't expect thi
As of today, there’s essentially 5 companies who have successfully completed meaningful SOTA moving decentralized *pretraining* runs: - @PrimeIntellect (10B INTELLECT-1, Oct 24) - @Pluralis (7.5B Node0, Oct 25; 8B Agora launch, May 26)
well, one way to be fast is to reduce the active count. another is to address the attention architecture. the sparsity graph doesn’t quite address this angle with param count x axis
best writeup on the topic. most harnesses out there are built with no abstractions in mind, software engineering replaced by giant brittle prompts and the occasional summarization. you cannot test them, you can barely reason about them and
MiMo-V2.5-Pro now available on Zyphra Cloud! Huge context, super fast, optimized and served on @AMD MI355X. Full context at $1/M input, $3/M output, $0.2/M cached. Try now at http:// cloud.zyphra.com
Modded-NanoGPT optimization result #29 (2026/05/11): @nilinabra has achieved a new step-count record of 2990 (40-step improvement) by halving the growth rate of the L2-norm of the hidden matrix parameters. This result is better than the
ProgramAsWeights can now trade compile time for accuracy. New Finetune compiler synthesizes 3.6K examples & finetunes 100 steps in 1min — 80% acc on tasks where old PAW got 0%! I used it to build an "Ask me anything" website helper: htt
Marvell CEO says copper wall is moving inside the rack, and copackaged optics is the only way through • Marvell CEO Matt Murphy emphasized at Computex 2026 that the next bottleneck in AI infrastructure is not compute or memory but connect
CVE-2026-8732 (CVSS 9.8) in WP Maps Pro plugin lets unauthenticated attackers create WordPress admin accounts via flawed "temp access" feature. 2,858 attacks blocked in 24 hours across 15,000+ vulnerable sites. Update to v6.1. #DFIR_Radar
I was just scammed for $500K by Polymarket. I am "willo2", the top holder of YES on "MicroStrategy sells Bitcoin by May 31st". Here's what happened:
interesting... the numbers on my a100 gemms are about 2-6% higher consistently with the new mma_throughput pragma vs when not
Serving LLMs is expensive because decoding is bound by memory bandwidth, not raw compute. KV caching solves this by storing each token's K/V tensors once and reusing them at every step, so you skip the quadratic recompute. Pairing it with p
Just imagine: you create a model (MinMax M.27) that scores the SAME results as Opus 4.6 on SWE Bench PRO. But when we create a benchmark where your model didn't train, you literally score 0. Because MinMax models are shit, and incomparabl
RL these days generally requires three things: inference, training, and sandboxes. I wonder what AI infrastructure provider has all three?
XBOW is harnessing the power of AI to transform offensive security. Curious how autonomous offensive security is changing the game? It’s day two of the Gartner Security & Risk Management Summit, and we’re ready to talk all things autonomou
Visual encoding is one of those bottlenecks that gets ignored until you’re running image-heavy workloads at scale. Offloading to CPU at near-zero added cost and getting 1.3-30x lower TPOT is not obvious — most people assume CPU offloading h
Despite rapid progress in AI agent research, Korean agentic benchmarks remain largely absent! To narrow this gap, we release K-BrowseComp, a benchmark that requires searching across Korean websites and Korean-language content. https:// a
q: "why don't Sora-like models learn compositional physics understanding or do ICL like how language models learn compositional semantics?" a: every attempt to date heavily leaks information from the future. some even bake it into the bottl
Custom KQL rule in Microsoft Sentinel successfully caught 260 SSH brute force attempts across 3 attack waves in 28 minutes. Rate-based detection prevented alert fatigue while maintaining 100% detection accuracy. Technical breakdown: • Rule
On coding, we see a big improvement on Vibe Code Bench, where the model scores 47.6%, a massive 35 point increase from the previous version and outperforming Gemini 3.1 pro, a frontier model. On SWE-Bench (75%) and Terminal-Bench-2.1 (54%)
Running a *causal_policy* mode experiment with Cosmos 3 with Fast-WAM style predictions, jointly training: - FD (action -> video) - ID (video -> action) - policy (past_video -> action+video) - *causal_policy* (past_video -> action)
Congrats to @JetBrains on Mellum2-12B-A2.5B-Thinking, an open-source 12B MoE that activates just 2.5B params, handling both natural language and code with a 128K context. Mellum2 runs natively in vLLM from day 0, with reasoning parser a
nemotron 3 is significantly less sparse than other models (~10% active vs ~3% for kimi K2/deepseek v4)
so @morgallant has optimized FTS tokenization throughput to 423 MiB/s and open-sourced it ( https:// github.com/turbopuffer/al yze …) I keep telling him that it would be really high agency to get to DRAM bandwidth (~100 GiB/s), and he ke
Biggest difference between Codex vs Claude Code: Codex won't stop working towards a /goal. CC quits on hard tasks and needs encouragement to keep going. So weird. BUT Codex is pretty bad at intermediate summaries. Its outputs read like eso
hint variations are quite good at dropping KL shock while keeping the rough tokens touched consistent. OPSD (custom variant): high exposure hint OPSD (custom variant): lower exposure hint OPD (yellow is negative KL, purple is positive).
It is often understated how hard ot is to compare multilingual models, especially small ones. With hundreds of people working on it, you can now enjoy:
Audio Dataset Cleaning: All that glisters is not Gold -- Said "high quality" audio datasets are not always high quality, which can leave you puzzled when training doesn't make your transcription or TTS model any better. The most robust app
Today, we're introducing LightOn Console. Three endpoints: /Parse any documents /Extract structured data /Search enterprise knowledge with citations Built-in connectors. MCP-ready. Governance enforced at the chunk level. No infrastruct
every couple weeks I bump up the disk GB on Replicas agents and go “yup, that’s more than enough” then some customer pings me asking for 2x more? so I gotta ask our sandboxing provider for more limits, and their poor engineer has to explai
Hackers exploited Meta’s AI support bot to reset Instagram passwords, briefly defacing high-profile accounts, while Meta pushed an emergency patch and experts warned about AI-assisted account recovery risks.
Cloudflare Sandboxes Cloudflare Tunnel You can now expose a service running inside a Cloudflare Sandbox using a Cloudflare Tunnel Both quick tunnels and named tunnels are supported
the agent at the top of the long mem eval benchmark leaderboard is built on langgraph! if you're thinking about building a high performance memory system for your agent, you should read this
BlueCyber analyses a January 2026 Mustang Panda PlugX sample delivered through a three-file set dropped by an MSI chain. The sample uses DLL sideloading, staged decryption, manual in-memory loading, and a final WinHTTP connection into the c
Most observability for agents is the wrong shape. You install a debugger. You connect it to your harness. You hope the spans the framework chose to emit cover the failure mode you're chasing. When they don't, you patch instrumentation into
i built an implementation of google's map-reduce paper. > master/worker architecture over tcp rpc > byte-range input splitting > reducers fetch intermediate partitions over rpc > worker health checks + automatic task re-queueing > atomi
I got 2 intel bad boys on Friday. At this point I’m struggling to find more power for all this. - 4x 6000s - 1x DGX Spark - AMD Strix - 4x 3090 - 2x intel arc b70 - Mac mini 16gb - MacBook Pro 32gb 544gb VRAM 300gb mixed Total = 844g
1/5 New paper: Representation Alignment Rests on Linear Structure with Guy Bresler and Yury Polyanskiy The Platonic Representation Hypothesis (PRH) posits that representations of data from different models converge as model performance imp