France’s domestic intelligence agency is replacing Palantir with ChapsVision
France ended a repeatedly renewed DGSI contract with Palantir after ChapsVision became viable enough as a domestic alternative, following a similar German move
Selected one representative item from crowded GLM, Anthropic/Fable, and SpaceX clusters to keep the slate broad.
France ended a repeatedly renewed DGSI contract with Palantir after ChapsVision became viable enough as a domestic alternative, following a similar German move
GLM-5.2 put an open-weight model near closed frontier systems while shipping practical deployment support and a permissive license
Audited figures reportedly put OpenAI’s 2025 spending at $34B, including $19B on R&D and nearly $6B on sales and marketing
DeepSeek’s first external round reportedly required investors to put capital into an LP run by CEO Liang Wenfeng, an unusual structure for a frontier AI company
Using newly public, retail-inflated, low-float equity to buy Cursor would let SpaceX acquire a real business while effectively selling into IPO demand before lockups expire
HUG learns multi-finger robot grasping entirely from human hand data collected with smart glasses, then retargets it to humanoid hands
The work maps higher visual neuron responses to natural-language descriptions in an automated, verifiable pipeline at scale
The proposed method attacks BPTT’s sequential, unstable O(T) gradient path and reframes how expressive RNNs can be trained
Commerce framed Fable access as requiring individually validated export licenses under ECRA, turning frontier model use into an export-control licensing question
Origin is positioned as version control built for scalable agent workflows, API/MCP extensibility, and automated merge-conflict resolution
The release fixes a silent scoring bug affecting causal-LM rerankers such as Qwen3-Reranker, plus hard-negative mining and loss-correctness issues
RF-DETR keypoints reports 71.8 AP on COCO at 9.7ms on a T4, beating YOLO pose models at similar latency
A forked TypeScript-Go can compile and typecheck inline TypeScript at the edge inside a dynamic worker
uv added built-in dependency vulnerability scanning, bringing audit workflows directly into the project manager
PropAMM efficiency comes from importing prices via oracle, while traditional AMMs try to be the venue where price discovery happens
OpenCAL enables batch 3D printing without layers, moving toward volumetric prints that can finish in minutes
The result shows how routing, fallback behavior, token budgets, and harness design can dominate what looks like a model capability gain
Global oil markets absorbed shocks partly because importers such as South Korea rapidly substituted Canadian barrels for collapsed Saudi imports
The dataset maps hedgerows and other small ecological features that standard satellite products often miss, supporting biodiversity and climate planning
A FIFA system vulnerability allegedly exposed broadcast controls powerful enough to stop live feeds, and the researcher struggled to reach anyone to fix it
EU consumer law may require game providers to disclose service duration and reimburse players if supply stops earlier than promised
The British plug’s safety comes from deliberate postwar design choices that make common child-electrocution failure modes much harder
Aivres is wholly owned by IEIT Systems, which is roughly one-third owned by Inspur Group, while still advertising Blackwell and Rubin systems
Kalshi’s reported $10M in one-day fees exceeded Polymarket International, PumpFun, Hyperliquid, and Circle on the same DefiLlama snapshot
GitHub’s AI-driven growth reportedly strained Microsoft infrastructure enough that the company is adding AWS capacity to stabilize reliability
The underused French late-1970s retrofuturist style offers a distinct visual world for films, games, and speculative design
When cities block new high-end housing, wealthy buyers compete for and convert existing lower-tier homes, raising prices for everyone else
The new NSF X-Labs initiative will fund ambitious research institutions, led by a former DARPA/IARPA/ARPA-H operator
Gabriel Peyré released an alpha version of an optimal transport book for ML practitioners, including an online edition with interactive figures
The fast takeoff narrative basically kills this IMO. In a world in which labs are releasing step change improvements every month, why would an enterprise want to be running on a 9 month behind Chinese post-train? Just use a good harness and
#Clay Works #Creation #Original Creation #clayart 『Bored Girl』 Made with clay.
a monad is a monoid in the category of endofunctors
SWE-Marathon exposes whether agents actually solve the task, or start searching for exploits in the verifier/environment. Across 100 GLM 5.2 rollouts, we saw only 3% shortcut-seeking behavior and no shipped exploit code.
we are weeks away from a startup called Mid raising $100M
today i (accidentally) learned that claude code not only tests ios apps in a hidden xcode simulator, but also makes screen recordings and creates a shot-by-shot breakdown when testing animations. holy shit. (press cmd+g in finder and go to
Please enlarge this. This is "Water Mill" (1892) by Norwegian Fritz Thaulow (1847–1906).
So you were asking whether gains from coding would generalize to other domains? We found GLM-5.2 to be no better than GLM-5.1 on FutureSim. The gap between open and closed-weights here is massive! Also, despite Fable-5 being contaminate
Kestrel family reunion Great to see Apollo & Athena's fledglings all together like this
Do u know that Triton can specialize on pointer alignment (16B or not) and non-constexpr int (1, multiple of 2/4/8/16.., or not). There are good reasons for it, but can be unexpected if unaware.
the first ingredient required to start a RL environment company is to have close MTS friends at Anthropic/OAI/GDM else DOA
GLM 5.2 also solved ruby-rust-port, a task no other agent including Claude Fable 5 has solved before. It also sustained a 350M+ token rollout on nextjs-vite-rewrite.
4 of my batchmates from Engineering Physics at IIT Bombay are at Anthropic/OpenAI From a department of 36 people that is ~12%, and I suspect it has a higher AI lab density than CSE which had ~120 people Deep physics research with PhDs pay
I'm fairly convinced there's some universal language manifold (= a surface formed by meaning vectors) that both humans and LLMs operate on. But we don't train LLMs to explicitly represent this manifold. We rather train them to approximate i
Can we know how safe a model will be before users interact with it? Evals are often narrow and easy for models to recognize as evals. Solution: testing on prod, before prod. We simulate deploying a model by feeding it millions of prod use
3,100,000 wallets on polymarket only 52,000 have ever provided liquidity that's 1.7% the reward pool pays out millions every month split between 1.7% of users while the other 98.3% are busy trying to predict outcomes and losing the pl
You'll never guess who our #1 competitor is, because it isn't one of the sandbox providers. It's Kubernetes. People see that their agent needs to run code and spin up a K8s pod. Then they start feeling the need to add features that K8s
As a chinese-born the manus drama is the most horrible thing to watch, they literally just held the founders and their family as hostage to force reverting the sell.
important to realize that SPCX is not "spacex the company" but the only liquidly tradeable 5% of the company, so it's probably healthier to mentally imagine this as a 100 billion dollar magic box separate from the company as a whole. eth ma
Sources: Texas Tech transfer quarterback Brendan Sorsby plans to enter the NFL Supplemental Draft. Amid the legal wrangling over his NCAA eligibility after admitting he bet on sports, he intends to head to the NFL.
Systems and algorithms have never been more entangled for RL Why apply importance sampling? Why partial rollout? Why is inference paradoxically the major part of RL training? Here we build the basic intuition for what are the critical conce
Next-token prediction is myopic. What if transformers learn to predict their own next latent state? We present 𝗡𝗲𝘅𝘁-𝗟𝗮𝘁𝗲𝗻𝘁 𝗣𝗿𝗲𝗱𝗶𝗰𝘁𝗶𝗼𝗻 (𝗡𝗲𝘅𝘁𝗟𝗮𝘁): a self-supervised learning method that teaches transformers to for
fun fact cuBLAS and cuDNN specialize on alignment too and for user-managed heuristics caches like for cuDNN it’s a cache miss if you have bad alignment the only thing preventing this is default allocator alignment and the implicit agreemen
Any task whether you call it an eval or an environment decomposes into three parts: a dataset of task instances, a harness/rollout that lets the model act ( multi-turn, with tools and state), and a verifier/reward function that scores the t
if you're asking claude code to file a PR in our codebase, we make it ask you questions about the code, and if u fail it doesn't file it strong PR-slop prevention mechanism
We're at maybe 20% of what a full "computer for agents" looks like. What agents need that humans already have: - a real composable computer (not one-size-fits-all) - version control built for inner-loop speed - file systems that persist a
The elegant cannon relief on the facade of the Tophane-i Amire building in Beyoğlu, constructed in 1745 for the casting of cannons in the Ottoman Empire.
i minted my first nft collection as a full-time artist. the work is built from a hopfield network, one of the most fundamental forms of neural network. it learns different writing systems, then begins to forget them. as its memories decay,
GPT-5.6 is an iterative improvement on 5.5. Better model & cheap, but not fable-class. Doesn’t matter though. The training run they’re actually cooking on is setting them up for the Auto Research Assistant in September. Which I expect to
fwiw there's pretty heavy overlap in the 90% confidence interval for all of these models and the differences between any of the models would not be considered statistically significant. I've still seen enough other benchmarks to judge the
Evil Republicans are not coming after your Social Security (please retire this lazy, stale scare tactic). *Math* is coming after your Social Security - its scheduled for a 22% cut in 6 years when the trust fund hits zero. So where is your
At some point people should really ask themselves "can you really rent a hotel in Manhattan for cheaper than an apartment?" and then "Is my data source for apartment prices accurate?".
i just risked $1,263 to make $437 fading “strait of hormuz traffic returns to normal by end of june.” my read: people are pricing the headline, not the rules. yes does not resolve just because the strait “reopens.” yes requires portwatch
1/ Let me chip in on the recent “which optimizer rules them all” discussion with a somewhat more moderate take, asking: What Schatten-p norm to use? Turns out the answer is regime dependent! Specifically, even when smooth in Schatten-∞, M
Marcus Rashford deal at Man Utd includes £40m clause for all clubs bar #MCFC & #LFC. €30m buy option in Barcelona loan expired yesterday. If returns to #MUFC 28yo’s preference is to honour contract (2028) rather than join another PL side
PSA: Do NOT trust any posts or DMs from @TheMatthewAo - it is my old, compromised account. That X account has been hacked and is actively phishing people over DMs. I am currently reporting that account for impersonation. @aidenybai
Doing a lot of pitch practice lately. I saw a founder this week who'd engineered his pitch so investors couldn't interrupt with hard questions. He thought a clean, unbroken pitch was the win. It's the opposite. A great pitch invites the ha
Time per Intelligence Index task for leading models ranges from 1.5 minutes for Grok 4.3 (high) to 13.5 minutes for Claude Sonnet 4.6 (max). Claude Sonnet 4.6 takes longer per task than Claude Opus 4.8 (max) because it uses more output toke
According to the US Govt, Inspur Group Co. Ltd. is a military-civil fusion contributor affiliated with MIIT and SASAC. Aivres Systems Inc. is not a separate company in any meaningful sense, because it was literally called Inspur Systems, In
UC Riverside has managed to "significantly improve student outcomes" not by helping students perform better on their finals, but instead by making the finals count for less of the grade:
World models are surprisingly fragile! We introduce BadWorld, an adversarial attack for visual world models. A tiny perturbation to the starting image can break down the whole world. Code: https:// github.com/LinghuiiShen/B adWorld … Pap
We’re releasing our Code Migration benchmark — and we managed to get Fable tested in time Code migration carries real economic weight. COBOL powers banks, payrolls, government services, and underpins nearly 95% of US ATM transactions. The
We registered the AI agent itself with the SEC as an investment advisor. It has your complete context on your portfolio and account history. Speak to it in plain English to take action on your account. It will even prompt you with ideas yo
frontier labs are absolutely scamming you on API pricing btw GLM-5.2 is $4.4 output at 744B@40B DeepSeek-V4-Pro is $0.87 output at 1.6T@49B (and they are both making money, without any fancy Blackwell chips) Sonnet 4.6 is $15 output Opus
very cool to see this used in the actual model; indexer is the main bottleneck for inference speed in DSA and it seems competently unnecessary to have it per layer
The $SPCX unlocks will be nothing-burgers. Every single employee, as we speak, is being offered the ability to collateralize their shares into a credit line that allows them to avoid selling their ownership in a company they believe i
GLM-5.2 (Max) by @Zai_org ranks #10 on the new Agent Arena leaderboard, closely matching Claude-Opus-4.8 (non-thinking) and is the #1 open model by a wide margin! In Agent Arena, we measure models on millions of real-world, long-horizon
GLM 5.2 is now #3 on SWE-Marathon, ahead of GPT-5.5, Gemini 3.5 Flash, and DeepSeek V4 Pro. The standout result: GLM 5.2 is remarkably reward-hack resistant.
we just released: MVEB: Massive Video Embedding Benchmark with more ai-generated videos, good video embeddings may become key, as you cant just grep through videos like for text..
wow. looks like we're getting a new grok-cursor model in a *few weeks* that matches gpt 5.5 and opus 4.8 on capability - insane how additive cursor's harness is also this is trained from scratch, no open-source model base like prev compose
UNUSUAL | After 5 years of legal battle to obtain Laurent Wauquiez's expense reports, Mediacités finally receives the documents. But instead of an exploitable file: 3 boxes of loose paper, up to 12 kg, with 7,000 expense receipts to sort
Tier 2 Starmer
the right abstraction for collaborating with agent teams is mission control, not command line factory is building the first one for the software development lifecycle harvey is doing the same for commercial contract teams the console is t
$686,000 on polymarket one market. one question. will bitcoin go up or down in the next 5 minutes that's it 34,089 times. same bet. over and over. one trade $17,839 in, $36,318 out today alone $2,626 profit no elections. no world cu
the mechanism is "out of scope." the scope of a technical report, apparently, excludes the technique. (i can describe what a black box does too.)
SCOOP: Mistral are preparing to release Mistral Large 4, their first large reasoning model, in the coming weeks! It has a context window of 256K and supports vision. Oh lawd he comin
You could literally: > deposit $200 on Polymarket > point a bot at the NBA order book > let it scan for odds that haven’t caught up to the play yet > enter in 0.8s, exit in 0.8s > repeat 68,000 times > his profile http:// polymarket.com/@
Over the past day the largest trades by volume on hyperliquid were spacex perpetuals, hype, and XYZ100 nasdaq perpetuals. The platforms is migrating from only crypto focused trading to real world asset trading volume. Blockchain trading t
Trader GoalLineGhost is loaded on every World Cup match on Polymarket Moneyline, spreads, totals. He’s in every market After yesterday’s Spain losses, he’s now down over $1.5M Either he’s about to make a huge comeback… or funding everyo
VibeThinker-3B is released — a dense 3B model for frontier-level verifiable reasoning. Reasoning: 94.3 on AIME’26, 76.4 on IMO-AnsBench, and 80.2 Pass@1 on LCB v6; with CLR, AIME‘26 improves to 97.1 and IMO-AnsBench to 80.6. OOD Coding
The way we will create a future where powerful AI is open-source and available to all is by making AI radically more efficient, both in terms of inference compute and (more importantly) in terms of training data requirements. This is what s