Mastra-AI npm ecosystem hit by supply-chain attack
Microsoft identified 80-plus compromised npm packages in the Mastra-AI ecosystem after an account takeover introduced a phantom dependency
Top 90 curated tweets ranked for substance on 17 Jun 2026 UTC.
Microsoft identified 80-plus compromised npm packages in the Mastra-AI ecosystem after an account takeover introduced a phantom dependency
A 120 Hz slow-motion capture shows how much work is required to drive perceived input latency down to almost zero on macOS
A conjecture from 1965 about how many points are needed to pierce families of axis-parallel rectangles in the plane has finally been refuted
Three undergraduates spent ten months building a hacker fab and are reportedly approaching their first complete NMOS transistor on tools they made themselves
A retrieval-conditioned VLA policy can be frozen once and extended to new tasks at test time by adding cheap human-hand demonstrations to a retrieval pool
A new benchmark asks coding agents to ship complete playable Godot projects across 140 tasks, and the best current agent solves only 41.5%
Z.ai’s GLM-5.2 reached the top open-weights score on the Artificial Analysis Intelligence Index while sitting on the cost-performance Pareto frontier
Epoch proposes a 60-plus-task taxonomy of frontier AI research work and rates each task from 0 to 5 by current automability
Daron Acemoglu’s new working paper models how pervasive automation combined with redistribution can reshape political incentives and repression
A Rust abstraction on Tile IR claims effectively free safety for GPU kernels, with a safe GEMM competitive with hand-tuned CUDA on B200 hardware
A 0.6B encoder compresses long context into latent vectors for a 4B decoder, reducing long-context cost while preserving accuracy
ABC releases what it calls the largest teleoperation dataset to date along with open training and infrastructure for robot policies
GPT-5.4 helped move a medicinal chemistry project from literature review to a validated experimental improvement in a widely used drug-discovery reaction
The FP8-for-everything narrative breaks down in scientific computing, where many workloads still require FP64 precision or careful compensated methods
As interposer sizes grow, TSMC and Intel appear to be converging on glass cores for advanced packages, with OSATs positioned to benefit either way
Form fields can now size themselves to their contents with CSS field-sizing, removing a common JavaScript workaround
Reports of Reliance Communications announcing Telegram IP prefixes through FLAG Telecom raised a live BGP hijacking concern affecting traffic beyond India
Robin Brooks argues China’s oil imports fell because Hormuz was closed and Iran was blockaded, not because Beijing was intentionally stabilizing prices
New USCIS data show average naturalization processing time reaching 9.5 months, with more than 400,000 cases pending over six months
A 0.2% Illinois crypto transaction tax applies even to transfers between personal wallets, creating potentially massive costs for large custody moves
Annualizing Snowflake’s latest quarterly GAAP revenue gives a $5.56B run rate, narrowing the apparent gap with Databricks’ non-GAAP ARR figure
alphaXiv is deploying agents to set up arXiv codebases, resolve environment issues, reproduce core claims, and rank papers by implementation difficulty
Replacing Notion, Roam, and Airtable with a custom app shows how AI-assisted building can make personal software stacks viable for power users
When tacit expertise trains frontier models instead of becoming bespoke software, founders trade durable end-customer revenue for easier distribution through labs
The critique argues that Commerce’s letter to Anthropic stretches export-control rules because access to a hosted model may not constitute an export of an item
The Department of Commerce reportedly held back on adding DeepSeek, CXMT, and more than 100 Chinese companies to the Entity List to avoid escalating tensions with China
Renaissance justice often ran on patronage, which helps explain why Bruno survived earlier trials before the Inquisition finally executed him in 1600
A free visual novel lets players romance more than 25 mathematical concepts, complete with four endings, a genocide route, and secret characters
Many products marketed as vegan leather are just polyurethane plastic, while actual plant- or fungus-based alternatives remain expensive and scarce
A Great White Egret in flight at RSPB Ham Wall in Somerset recently.
Charlton forgot to mention the 80% discount and MFN
Very hawkish dot plot. Nine out of 18 officials have at least one hike this year (and six of those 9 have *multiple hikes*). Only one person has a cut this year, and one participant (presumably Warsh) didn't submit an SEP The statement
zhipu和deepseek在25年春都曾经是jina reader的数一数二的大客户,也都是由我直接founder support。二者给我留下的印象就是非常精,对技术指标要求非常苛刻,动不动就p99
not surprising. to my knowledge there's a single person in the US government with experience working on frontier AI models at a company.
. @ItzSuds won’t stop — he’s sourcing founders earlier and earlier. Locked in allocation with Aurelio to an Uncapped SAFE.
nvfp4 vs mxfp4 is not just different choices of block size and scale format, nvfp4 uses an additional tensor-wise scale factor to overcome the range limit of fp4, and thus can use more precisions for block-wise scale factors.
New nugget in our latest story on the Anthropic Fable saga: Dario Amodei told Howard Lutnick "This means we can't have the model out" Friday after learning of the ban on foreign use. "That's the point," the Commerce Secretary said.
We're launching turbo mode data extraction - 5x faster, 5x cheaper, and 7% more accurate than Azure Content Understanding. 4.5s p50/7s p90 across 1-30 page docs - good enough for realtime user flows.
Are AI agents shape rotators? In this new benchmark, we let the models play campaign puzzles in Opus Magnum, a puzzle game by @zachtronics . Ironically, Claude Opus 4.8 performed poorly, being beaten by GPT-5.5, Gemini 3.5 Flash, and GLM
RQL is a new, clean algorithm for (offline) flow RL! The main idea is to treat flow steps as MDP steps, and use "reversed" flows to generate hindsight flow trajectories for off-policy data.
New work: The Value Axis How do LLMs choose which path to take mid-task? We find they internally track the chance of reaching their goal along a linear axis, akin to a value function in RL. We show it modulates confidence in math & coding
The tightly overlapping beaver twins #Hanamura_City_Animal_Park #American_Beaver 2023
Using off-policy (rollouts of another model) prefixes gives the game away - the model would learn to classify off- vs on- policy even better than they do already. You would get higher eval awareness, not lower, even though it would be bette
vc data point: old diligence asked: - who else is in? - how big is tam? - did a famous firm pass? new diligence asks: - what work disappeared? - why now? - what breaks if GPT-6 gets cheaper? status questions age badly when research is fr
Layering in: 1) The Anthropic/Google data center rental ARR ($26B) 2) And Cursor's end-of-year ARR (potentially over $10B) On an annualized basis, I expect SpaceX's revenue to exceed $60B by year-end.
currently all of the results are getting manually merged in by a single co-ordinator... it's a huge bottleneck... so i'm adding hierarchical merging locks where any agent can apply changes. then i need to start hosting them on aws. i nee
Poor theory of mind is one of the main things keeping models from being good software engineers. They can resolve specific, reproducible bugs, but they struggle to anticipate what users want in the first place, which is much of what buildin
as much as we have fun building evals + environments, at some point, poor grad students (among others) become the bottleneck to improving AI system capabilities. there's a ton of domains that are technically verifiable (but not in ways that
LoopCoder-v2 is out Loop Transformers reuse the same block for recurrent hidden-state refinement — letting models “think” more without simply stacking more layers. We study how many loops are actually worth it in Parallel Loop Transforme
Yeah, I think this is a fair concern. One practical issue is cost: a single 24h Codex run already consumes around 100M tokens, so extending this to the full two-week human window across multiple tasks/trials would quickly reach the 10B-tok
this part is actually very interesting, for the mtp head at t+2 they don't include the kv of the indexer of the predicted value at mtp t+1 for efficiency (indexer sharing) AND found that it leads to better results because it avoids training
LMAO, $sats liability is GONE. GONE! Just in 30 mins ago. TLDR: $sats owe FCC $2.9 Billion. If Auction 113 raises ~$2.921 billion or more, EchoStar owes $0 It’s $3.1 Billion now Project out the rest of the spectrum echostar owns. Ma
Read about it here - https:// datalab.to/blog/turbo-ext raction … . Our latest latency test showed p50 4.67s, p90 7.0s, p99 17.05s. Field accuracy on our internal 225-doc benchmark is 89.5% vs Azure 83.4%. Pricing is $6/1000 pages vs Az
I only recently realized that Zhipu is far from the only lab that has moved away from GRPO. Some teams working on long horizon tasks still rely heavily on PPO or even REINFORCE, and a few have never seriously adopted GRPO at all. It is int
Meltdown, 2023-26 Edition of 16 unique works by @andreasgysin Fully on-chain (ERC-721) JavaScript, WebGL, silent, responsive Zero 10, @ArtBasel
Hm... But often for the wrong reasons. Like the infamous "tell the AI or alien space prob a logical paradox to make it explode". When it's closer to buffer overflows.
Dog Colorful dogs in condiment colors. They have been waiting patiently for summer. Cooling off in the sea, warming up in the sand, then doing it all over again. Favorite food: Nachos
We have a portfolio company where I installed a new CEO. No one said no to him because they all wanted to suck up to the new owner. I was way too hands off. He went on to launch a new vertical and burned a lot of $$$ pre PMF because no
GLM 5.2 is absolutely convinced that it is actually Claude, from Anthropic. When I tell it that it's GLM 5.2, it refuses to believe me, but is willing to check the local agent config to see what model is running. The realization:
New @fulcrum_inc research - Agents are under-elicited: A case study in optimization tasks. We find that simple and general prompt/scaffold interventions can roughly double agent performance by getting agents to use more resources more ef
Databricks announced it has crossed $6.9b in annualized recurring revenue, up 80% year over year. Snowflake's latest quarter puts them at roughly $5.3b ARR, up 34%.
winning position on polymarket usually isn't the smartest analysis it's just being first news breaks -> sharp money moves -> by the time you open the app, the line already priced it in you were late > signal detected > analysis done > o
We’re publishing a new daily report comparing GPU compute prices, price changes, and volatilities across models, with data from @ComputeDesk , Bloomberg: CIBLKWUS, CIHOPUS H100s, the oldest model with the largest install base, currently s
It’s an internal site for usage stats
Every time @mlmabc posted a large TWAP, I wondered why anyone would reveal their execution params to the whole market instead of executing privately So we dug into the data Turns out visible execution is not that bad and can even be che
CMU Advanced NLP Lecture 9: Decoding Algorithms This lecture explains a key aspect of generative LLMs: The model learns a probability distribution, but useful generation still depends on how we decode from that distribution. Greedy deco
Becoming pretty clear the real AI labor story is less mass layoffs and far more org chart restructuring > Good slides from Cloudflare $NET on automating sales support, redeploying the savings into AEs, and driving more growth w/ the same
the self is a model that is used to alter your automatic tendencies in order to improve your safety. it vanishes bit by bit once safety is no longer in question
Quick UX tip: Crossing out completed todo items makes them harder to read Checkmarks + dimming are usually enough
Economists often study labor markets using the O*NET database, which breaks ~1000 occupations into tasks. But these tasks are too coarse-grained to track automation in AI R&D specifically, even in occupations closest to “AI researcher”.
For years now, the actual rate change announced at every FOMC meeting did not matter. By the time the meeting occurred, the move was priced into the SOFR curve weeks in advance. The only exception to this was in September 2024 when Powell s
To succeed at this game, agents must reason about shape rotation, concurrency, and optimizing against competing tradeoffs. To match the human world record on all puzzles would be an insane feat. Agents played the game entirely through a py
GLM 5.2 is the new open-weight SOTA on the Vals Index, Vibe Code Bench and Terminal Bench! It is also #5 across all models, and right on the heels of Opus 4.7 - released only two months ago
another day, another batch haha - @RicursiveAI ( @annadgoldie ) - @AI21Labs ( @AmnonShashua , @origoshen , @yshoham ) - @unconvai ( @mcarbin ) - @inflectionAI ( @mustafasuleyman ) - @hark_labs ( @adcock_brett ) - @simile_ai (
paid 1c on a kuala lumpur temperature call $4,092 on that position right now 10 more open just like it. all at 100c $16,961 all-time. 8,421 predictions. closed tab is wall-to-wall green GFS and ECMWF update every 6h. polymarket prices l
This is FALSE 1. The Govt literally pays €30million+ to a private company, Didean Dochas, to buy houses across the midlands for asylum seekers. This company owns the houses, rents them back to the State and routes all profits through Is
very hard for AI to blow up -- at current market prices on OpenRouter for GLM 5.2 8 200s cost $370k and can churn $1.47m of tokens a year - so 3-4 month payback period and fully tax deductible as equipment
Bad Apple but I’m drawing it with Strava, frame 1470
Having worked on unlearning for multiple years, it was clear that post-training "fixes" alone were a dead-end. Model learning is way too entangled. With 𝗡𝗨𝗟𝗟𝘀 we decided to architect unlearnability into the model, and scaled it to 1B+
We are taking a big step towards scaling LLMs that can unlearn on demand. Cleanly deleting data from LLMs has proven impossible: training entangles every source in shared weights. NULLs (Natively Unlearnable LLMs) escapes this, keeping mill
We just released an open-weights IDM that action-annotates unlabeled screencasts. We outperform all off-the-shelf models (both open and closed!), many of them being orders-of-magnitude bigger. (1/3)
Etherfi is crushing it with 30k daily credit card transactions and $3m daily volumes Over $1b annualized spend on their cards And it’s still being priced at a fraction of other private companies and tokens doing the same thing Think of
Great work! The coding benchmarks are really impressive. Parallel loops are especially good for memory-bound decoding particularly on edge devices, because the extra compute can often be hidden under memory access.
The heron taking flight from the Anadolu Hisarı pier
POV: You warned your friend to not build a business on top of the scratchy Claude Code endpoint that Anthropic is going to be dropped for sure
This is so ironic, cause I’m pretty sure they increasingly feel like (at least in CS adjacent fields) joining a frontier lab is their only chance to do frontier (pun) research again
Balmain East House by Studio Johnston Sydney, Australia
Sleepy Wren!
They use fixed-point residual as a halting signal itself unlike previous papers. I think it's close to EqR in spirit of landscape/attractor shaping as it modified training with pre-norm, residual scaling and damping. Other papers focus on t
I’m honestly very excited about Virat Kohli’s new brand. He walked away from a guaranteed ₹300 crore of Puma money and instead threw his lot in with a little-known Indian brand called Agilitas. This is their story. Agilitas was started