Supply-chain attack backdoored 700+ Laravel Lang package versions
A widely used community PHP translation ecosystem was compromised across hundreds of historical releases with a remote-code-execution backdoor
Balanced toward durable technical and policy material while limiting agent/LLM items to cases with concrete artifacts, measurements, or consequences.
A widely used community PHP translation ecosystem was compromised across hundreds of historical releases with a remote-code-execution backdoor
A public claim that agents built an entire OS from one prompt becomes a case study in what current software agents can and cannot actually do
US AI power demand is framed as jumping from about 3 GW in 2023 toward 28 GW by end-2026, faster than grid interconnect queues can absorb
A frontier AI lab working with the NSA under classified terms tests how safety policies, lawful-use language, and national-security carveouts survive government demand
Coding-agent workflows have changed enough in six months that earlier advice for using them on large repositories is already becoming obsolete
CXL memory devices can present a terabyte-scale addressable pool to the OS while internally mixing DDR, flash, or custom acceleration tiers
Asymptotically faster matrix multiplication transformed theoretical computer science but still struggles to beat conventional algorithms in everyday computation
A solved AI-designed RNA structure appears to contain a tertiary interaction not previously seen in the Protein Data Bank
A postMessage OAuth redirect flaw produced full-access token theft without phishing or a fake login page
A humanoid robot project is being packaged as a home-buildable open-source kit, closer to assembling a car than assembling furniture
DeepSeek’s compressed sparse attention can still materialize a massive fp32 indexing tensor before top-k selection, making long context memory behavior nontrivial
A WebGPU liquid-glass effect is shown working across three.js and an iframe-rendered HTML-in-canvas page
Installing customer-hosted software is a small part of the work compared with safely upgrading dozens of customer environments later
Card-network fees and rewards are tied to merchant and consumer incentives that stablecoin payment products often underestimate
A paper can be reframed as a structured list of claims backed by code, experimental logs, and data for independent analysis
A new NVIDIA linear-attention model improves memory editing by separating channel-wise erase and write operations instead of sharing one scalar gate
Low-bit training rather than post-hoc quantization keeps weights in {-1,0,1} while retaining roughly 96–97% of full-precision MiniCPM4 performance
Robotics funding is clustering around robot hardware, foundation policies, and the data/simulation/pretraining stack around those policies
A highway removal in Rochester converted land into $229M of development and new municipal tax revenue
Long-form posts on X are claimed to be undiscoverable by web search, AI search, and even Grok, limiting their durability as published work
Starcloud-1 reportedly carries a functioning NVIDIA H100, moving data-center-class AI hardware into orbit
NVIDIA diagnostics can flag an unacceptable NVLink error rate while still recommending that workloads continue running
Preventing chronic disease can extend years of baseline healthcare costs while adding the ongoing cost of the drugs themselves
A DynamoDB-compatible system on Cloudflare Durable Objects now has distributed transactions, core item operations, hash partition splitting, and bottomless storage
A €200 Linux color e-ink reader appears to expose root-readable internals through a web upload path
Coincident detections from paired Geiger-Müller tubes can provide a physical entropy source for visual random-number generation
Maintainers are treating Bun’s AI-assisted Rust rewrite and supply-chain risk profile as grounds to deprecate support
C++ bitfield layout is implementation-defined, so mirroring HLSL/SPIR-V bitfields across CPU and GPU can silently break across compilers
Gaussian splatting methods often assume Gaussians do not overlap, and Softmax-GS improves rendering by modeling overlap more realistically
Comparison of ship attitude on reentry between Flight 11 and Flight 12
Still limited by compute, so I built a thing that runs codex in the cloud, powered by @Cloudflare firecracker boxes (and since that's not beefy enough for larger projects, tests are run via crabbox) Uses Ghostty ofc, via WebAssembly. Cod
// Adapt the Interface, Not the Model // I am fascinated by the results across my cheap-model-plus-good-harness builds. This new paper also shows good signs of the code-as-agent-harness thesis. The idea is really simple. Do not touch the
FACT ALERT : In modern agentic coding, 42% of the time is spent on CPU doing tool use such as editing files, running Bash scripts, running lints, etc. The economy of traditional cloud computing charges at $ per cpu core. In the economy of
My acoustics project is 100% agent written c++. It’s also beating every current industry leader on every metric I’m measuring: memory use, cpu use, binary size, perceptual performance, developer ergonomics. Not being able to vibe code c+
"Code as Agent Harness" Agents are becoming less like chatbots that write code and more like systems that run on code. This new Meta paper reframes code as the harness around an agent, the executable layer for reasoning, acting, memory, v
Weekend project. This is my personalized Qwen Harness(that I made over the last 2 days) running locally on my Mac. 30 tokens/second Current SWE score of 74.67% on a smaller subset of problems. I will keep on improving the harness to sque
Some of you noticed limits drained faster in Codex, we root caused it to an optimization that we rolled back that had an impact on cache hit rates when compacting across long running sessions. We fixed this and have now reset usage limits
Congrats to the Webwright team https:// microsoft.github.io/Webwright at @MSFTResearch for taking the #1 spot on Odysseys, a highly challenging benchmark for long-horizon web agents: https:// odysseys-website.pages.dev/leaderboard Ody
Thanks for sharing our work here! We think this will be super super impactful for inference of long running agents Original blog : https:// blog.doubleword.ai/speculative-kv -coding …
Codex computer use entirely driving iphone simulator to bug bash a feature it just built
Added a DeepSeek Sparse Attention (DSA) from-scratch implementation to my LLMs-from-scratch repo thanks to an awesome new reader contrib. With motivation, overview, and GPT-style model reference implementation as standalone example code:
"KirbyMM: Outer-Product Based Matrix Multiplication on ARMv9 Processor", Sun Yat-sen U, DATE 2026, Apr 20 https:// past.date-conference.com/proceedings-ar chive/2026/DATA/316.pdf … Best Paper LX2 vs Apple M4 Y. Lu, LineShine 总设计师 <= http
I just seem to have resolved a common performance issue in Clash-type clients. I switched all domain keyword matching to use Bloom filters, and both memory usage and matching speed improved by an order of magnitude. A similar approach could
nanobot × CLI-Anything nanobot now becomes your actual computer use coworker Instead of just talking about tasks, it can now directly operate the apps where real work happens - from 3D modeling and design tools to office workflows via C
We fine-tuned Protenix on RNA data using @try_litefold Tune (our multi modal fine-tuning engine) and got 20% jump in pLDDT and 10% jump in the avg TM Score. Currently sota so far on rna structure prediction. More announcements on this.
Claude came up with these techniques for agents to debug MacOS Kernel related issues for a project i'm working on and it's low key insane how effective they are and how deep it manages to go in something obscure to 99.9% of engineers
Can we talk about speculative KV coding? You run an FP8 model to predict the BF16 cache, then just arithmetic-code the residual. We are literally burning extra forward passes purely to shrink VRAM footprints by 4x. Compute is officially che
1M WPM for HBM by end of 2027 is too aggressive "We forecast front end wafer capacity allocated to HBM will grow from 390k wpm at end of 2025 to 500k/960k at the end of 2026-27, eating away at capacity for DRAM where demand is also rising
> codex mac app: knowledege work, learning, reading > cmux + codex cli: coding
every vendor → vendor escalation involves a datadog screenshot. you can't escalate without showing datadog. canonical proof of problem free alpha: datadog should build inline cross-vendor tracing with at-mentions. would overtake slack conn
Real-time telemetry from 50M+ metrics per second, zero cardinality bottlenecks, and predictive maintenance built in. #InfluxDB 3 handles data center ops at scale without external ETL pipelines. ↴
We were able to make the rust rewrite faster than pnpm in all scenarios
Finding the right soundtrack for a scene is still weirdly hard. Adithya is trying to fix that. Introducing taan, it watches your video and generates music around how the scene actually feels.
someone is using v12 on def con quals and it found the bug #OhDear #Slopped
hyperliquid's public api caps you at 1200 weight/min and 1000 ws subscriptions per IP as someone who needs clean sub second data across hundreds of wallets and markets, that's a dealbreaker switched to a source with no rate limits, batch
We’re treating this codebase like it’s under way more scrutiny than before, because it is. A year ago, rewriting an entire 700k LOC codebase in a new language in 6 days was impossible. Today, it doesn’t sound like it should work. And yet,
Today, quite a few friends mentioned that The Log is the Agent: Event-Sourced Reactive Graphs for Auditable, Forkable Agentic Systems https:// arxiv.org/abs/2605.21997 looks a lot like the one in bub: https:// tape.systems. Actually, ov
OpenAI has one of the best cost optimized APIs in the game. Flex processing cuts costs by 50% for non urgent workloads, batch api does the same for async tasks. If you're running evals, data enrichment, or background agents you're probably
"Tokenisation via Convex Relaxations" Most LLM tokenizers still use BPE, a greedy merge algorithm that can waste vocab slots on locally good but globally suboptimal tokens. This paper turns tokenizer training into a linear program, then r
Glad to see our INDIBATOR discussed in a Nature Methods article on AI co-scientists! We designed individualized AI scientist agents grounded in real publication and molecular histories to drive molecular discovery. https:// arxiv.org/ab
Last night, I gave a talk on my recent research related to Test-Time Scaling. We had in-depth discussions on power sampling, atom-of-thought, my work ETS, self-evolving systems equipped with TTT / TTRL and memory, as well as several extensi
Interesting results, was debating with ppl on this many times. Duplication (multi-epoching or duplicated data) hurts training a lot more than some bad data for pretraining
We really need a benchmark for robotic policies on contact-rich tasks
Claude Mythos alone is finding more vulnerabilities than were found from all sources combined in prior years
I adopted @steipete 's coding workflow last year. You just have to just talk to your agents. So it's super important to know when and where an agent wants to talk to you! This is what I built cmux around. When you have a lot of codexes/c
did more prototyping on this one, calling it `meshnet` what you see here is each chat on its own VM, but you're just developing on localhost:3000 so it feels local def something here, open sourcing the prototype and prompt so hopefully so
A little secret. About 5% of our production traffic is on the Pi harness, about another 5% is on OpenCode. Reminder you can use your ChatGPT account in a flourishing set of other tools. We’ll continue to make Codex awesome, but you have op
I built an autotriage skill for codex that has a set of guidelines + reads VISION.md from my repos, so issues/prs that have a clear way of - fit vision of the project - being inferrable in code with high confidence - clear fix - can be live
I heard that you like shaders Here’s my attempt at recreating the Apple Intelligence border glow while I wait for WWDC 2026.
Writing CUDA kernels are nearly solved problems.
Warden is already at $25k in cost this month using almost exclusively Sonnet. We're still a couple orders of magnitude off from where costs need to be for this level of capabilities. Or we need capabilities to jump several orders of magnit
was supposed to fix a tokenizer, but I’m watching gpu mode on torch titan internals instead for some reason
Interesting. Use an LLM as a judge to filter out tokens to mask during OPSD. Slight improvements over normal OPSD but seems alot more compute intensive?
The quest for reliable on-policy self-distillation continues. Hope something would stand the test of time.
“got a gig collecting memory cards for OpenAI”
RWKV-7 G1g is here: the world's best pure RNN LLM, and a competitive LLM in general. Try https:// huggingface.co/spaces/BlinkDL /RWKV-Gradio-2 … for bsz16 7B inference. G1h in June p.s. const 15000+tps decoding on single 5090: https:// g
Required a lot of hand holding (not that I mind), and with the right scaffolding I asked Codex to improve the AIR emission metadata. 15 minutes later, we now have the first (afaik) native pure Zig Metal pipeline with matching performance.
everything denoiser
Another memorial weekend thought: for either (a.) single use code - [majority of slopware] (b.) cold cache code - [also typical of slopware] the CPU loading the code is a significant amount of the burned memory bandwidth ...
auth.md is a coordination layer over OIDC/WIF-like primitives, but it does not remove adoption cost, e.g. Linear still has to implement federation, etc. If they’re doing that anyway, a clean OIDC/WIF profile could solve the core problem jus
Bun in Rust is better than the original, and it’s going to keep getting better. We fixed a lot of bugs inherited from the original. We’re fuzzing a lot more.
It's a meme but I really have been letting the psychosis take over as much as possible to figure out what I can actually do with these things - codex desktop app computer use spam - hermes agent that I've been loving - going as hard as pos
Explore our kernel design agents:
The external services that my agent relies on are now all accessed through http:// slim.tools. The benefit is that once a service is configured in the web backend, different agents on different devices can immediately use it. Although the
The GPT 5 series progress over the past 10 months with SWE PRO (public) ! • GPT 5.1: 50.8% • GPT 5.2: 55.6% • GPT 5.3 Codex: 56.8% • GPT 5.4: 57.7% • GPT 5.5: 58.6% From 50.8% to 58.6% across the GPT 5 series is slow, steady, and very rea
I remember the good old days when the Department of Energy's goal was to run an (FP64) exaflop machine in 20MW. We thought that was a lot of power!
If you're using qmd for searching a Markdown knowledge base, remember to upgrade to 2.5.1, which was released this week. The old version has issues with Chinese word segmentation, resulting in very poor recall for Chinese content. This is
everything is a bytecode-targetable virtual machine if you look at it sideways. JBIG2 image decompression? virtual machine. x86 mov instruction? virtual machine. truetype fonts? virtual machine. Magic: The Gathering? believe it or not, als
Thought my GPU was cooked, nope. Turns out a random Discord update turned on Clips by default. Turning it off made me go from 100% usage to 7%. Go turn that shit off and save yourself the headache. Hope it helps
Some polymarket builder ideas: -trading spreads of contracts directly: yes June 30th and no dec 30, without the double spread -RFQ for larger bundles, like a mm offloading their positions. -OTC trading in a low-trust way