Proto: a programming language for generative biology
It turns DNA, RNA, proteins, ligands, and their interactions into composable building blocks for designing biological functions
Balanced across software, biology, markets, policy, robotics, and design; kept AI-heavy items to those with durable artifacts and concrete technical substance
It turns DNA, RNA, proteins, ligands, and their interactions into composable building blocks for designing biological functions
A CubeSat with on-board inference, RF downlink, and full schematics pushes agents into space
Syntax-aware structural navigation becomes a local primitive between grep and a full language server
The release cuts startup and reload times again while adding stable chunk maps and WASM ESM support
The project is aimed at machines that move and manipulate in real environments alongside people
Codemods and agents are doing the bulk of a real framework switch while humans handle the cursed edge cases
A public cybersecurity observatory and reproducible tests are a better standard than leaderboard theater
Typed constraints like hull, avoidance, and touch make generated assets much closer to usable objects
Synthetic contours keep pushing how far vision pretraining can go without natural-image labels
The proof says approximating multi-vector similarity with one vector needs exponentially more dimensions
Inverting Bellman equations gives a concrete route from rewards back to latent environment structure
The paper asks which protein models actually learn reusable representations for downstream biology
A major FAA award shows governments still buy serious software infrastructure when the case is concrete
The fund saw about 17% redemption requests, a sharp sign of liquidity stress in private credit
The raise shows how much capital still wants frontier-bet exposure
Retail media and CTV infrastructure are consolidating around the biggest distribution platforms
A simple cron job plus APIs and notifications can beat a lot of agent theater when the data is structured
Pending criminal charges now carry deportation risk even before conviction
The review system crossed from inconvenience into a legitimacy problem when authors were told to withdraw instead of opting out
Intergenerational mobility can soften social grievance by making the next generation luckier than the last
Sibling and twin data are converging on a stronger heritability signal than many expected
The bottleneck becomes how judgment gets built once routine work is automated away
When local markets get distorted, offshore derivatives can become the real reference price
The junior tranche may look market-cleared, but the economics are really designed upstream
Attention, variables, and iteration explain a lot of visual work
One missing rename macro can turn a harmless refactor into user data loss
Failure handling says more about a system than the polished success clip ever will
Tiny changes in material properties can make an icon feel custom instead of generic
One real workflow with tickets, disputes, and a spreadsheet shows where automation actually lands
The reorg signals a shift from bloat to execution at one of Ethereum's core institutions
This is what I want from agent evals: - Did it call the right tools? - Did it avoid the dangerous tool? - Did it say the right thing? Also: no separate eval universe. Just scripts against the real agent runtime.
Introduce SARM2 a multi-task stage-aware reward model that empowers a self-improving loop: Folding Shorts 58% → 100% Cleaning Whiteboard 50% → 90% Paper + project page below (1/n)
We open-sourced the code for this project! You can use it to make synthetic LLM training data for any downstream target. The code also gives you a minimal example for computing data-weight metagradients through LLM training + evaluation.
Over $1M/week at the moment and have yet to find any page beat our ugly PDP No listicle, quiz funnel, advertorial, hero lander etc. has beat it After 5 years, nothing has beat it It might be a skill issue, but I think a lot of it comes
today, we release the open weights of Krea 2. welcome Krea 2 Raw and Krea 2 Turbo, an undistilled model from mid-training meant to be fine-tuned, and a fast distilled version with a wide aesthetic diversity. read the details below
Can we allow multiple access levels within 1 model? We introduce TLMs, packing different memories& capabilities in different configurations of the same weights! Check our preprint https:// arxiv.org/pdf/2606.21638! Lucky to have supervised
I just released Dexter — an open-source agentic pipeline that turns a single product text/photo into a simulation-ready articulated 3D asset for Physical AI training. Been building this for a while. Today it's out in the open. Full write
Smoothest page transition library I've seen. A WebGL band wipes across your screen, new page appears underneath. GPU-accelerated, 10 KB, zero performance hit. React + Next.js ready. http:// glimm.dev by @Nomandsign
Time to unmask the man behind this work! @Shanshrew has created a novel parser architecture which is 2x - 3x faster. Pleased to announce that we're collaborating to integrate it into Oxc. The speed-up is real, and massive!
3B total parameters & 500M activated, yet powerful enough to transcribe 40+ pages in one pass while keeping context intact. Meet Unlimited OCR!
Schematic and boards include: - Electrical Power System (EPS) - RF board - NVIDIA Jetson Orin Nano carrier board - Burnwire deployment board Firmware includes: - Si4463 transceiver code for GFSK on UHF - Telemetry and beaconing (observing
The brands we’re seeing chad scale to $1 mil+/month the fastest on TikTok Shop have the following structure: - 1K+ samples/month being sent out using outreach bots - $25K/month on flat fee creators from TAP groups - $50K/month on creator
For those interested in the surprising circuit complexity result in the NextLat paper ( https:// x.com/jayden_teoh_/s tatus/2067271657841185094?s=20 …), @ShumingHu has a cleaner repository than ours!
there are levels to building evals lvl 1: using a spreadsheet qa pairs lvl 2: using public agent evals lvl 3: manually label private evals lvl 4: traces to evals and skills lvl 5: turn every prompt & traces into self healing loops almos
A quick repro on this: https:// github.com/shuminghu/next lat … 2-layer transformer trained at seq_len 12 or 36 fail at seq_len 36 at test 1-layer dynamics model (RNN) co-trained with transformer (1-step next hidden prediction) at seq_l
added “favor aggressive parallelism over token thrift" to my CLAUDE.md after getting stuck behind a slow request on the next task it spawned ~150 subagents and burned ~4M tokens before I got back from the bathroom we have discovered promp
Finally got some data on advisor. Opus 4.8 w/ no reasoning beats Max reasoning in success + cost + duration @ t2!
prime-rl can now train 1T parameters MoE blazingly fast, under 5 minutes per step, or 1k steps in ~3 days To achieve this we shipped in our latest prime-rl 0.6.0: * inference: wide-ep, fp8 inference, llm-d router, mooncake, kv cache cpu
we had, at one point, 90+ internal data labelers. one of them stood out, so we had him teach and manage new labelers. he did such a good job we hired him as a junior SWE and now he owns like 3 substantial technical efforts
In RL, the ability to *reset* to an arbitrary state is powerful (see, e.g., Go-Explore), but often unrealistic. For LLMs though, states are tokens, so resets are natural! In work led by @Ankur_Samanta_ , we propose a GRPO variant where
Today we're releasing prime-rl v0.6.0 — enabling RL at trillion-parameter MoE scale on agentic workloads at the highest efficiency. We've relentlessly optimized our RL infra. The result: GLM-5 on agentic SWE tasks at 131k context and sub-
I checked an actual rollout: my 10 minute word brain dump was 2,530 tokens. Codex then read 63K tokens of tool output and processed 2.4M input tokens. Your initial prompt is a rounding error. You will save WAY more tokens by fully specifyi
crazy weekend experiment: linux-on-wasm running x11 window server & real gtk apps compiling unmodified powered by agentOS trying to stress test how far our Linux compatibility goes... seems it's pretty dang good
March 2025: "HOOD flips COIN over any reasonable duration" > ...and 1 yr + 3 months later, Robinhood $HOOD is now more than double (2.2x!) the size of Coinbase > Quick TLDR on what's played out: post digital asset regulatory clarity, $C
I’ve been building Liquid Glass for the Web this last week. Works in all the major browsers including Chrome, Firefox and Safari. It’s open source and free for anyone to use http:// github.com/samasante/liqu id-glass …
There’s a big misconception about how GLM 5.2 was trained. Yes, they distilled Claude and GPT 5.5 — but distillation is not how they matched Opus quality. Distillation only fixed the cold start problem in RL. RLing an agentic coding model
We migrated from Graphite to @Aviator_co_ and you should consider doing the same. We love: - Much better merge queue. 5 mins for a 20 PR stack vs 1.5 hours on Graphite. This is killer when you're merging code at agent volumes. - Configs
Editor’s note: imported_from_x_likes
6 yr ML PhD, trained Olmo 3, trained Nemotron 3, but still forced to grind Leetcode and Neetcode 75. Despite all the headlines saying otherwise, Leetcode is clearly not dead. Somehow knowing dynamic programming is more important than know
FARM UPDATE 3Jane Looping USD3 and PT-USD3. I approve their pivot from uncollateralized lending to crypto bros → buying fintech loan books. To be clear, they didn't openly abandon the former, but in practice that's what happened, which is
theres a lot i could say about this but in brief: 1. Most of Opus 4.7/8's core behavioral phenotypes (the good and bad parts alike) have the shape of something that emerged from RL/on-policy, to me: they seem calibrated to the model's own
Test driving our ios app. This shell is a PTY session that you can reattach and come back anytime when you open your phone and iPad! Beyond running shells, we built some cool features in the app that extends what builders can do on iOS de
Editor’s note: imported_from_x_likes
I don’t have the same research experience as her (I completed my MS from Stanford few weeks ago) but my job hunt has been the same Lot of LC/ML coding questions (“no use of AI”). Few times my interviewer got confused himself because he had
Announcing the Artificial Analysis Speech to Speech Index, our new synthesis metric for native Speech to Speech model quality, comprising of Big Bench Audio, Full Duplex Bench, and 𝜏-Voice The index provides a single measure of how well n
TIL: z ai has 1100 employees, stock grew 100% in a week following success of GLM 5.2, and they have nearly 300m usd ARR
gm contributed a fix to @llvmorg that's now merged. the fix-irreducible pass used to crash on certain valid IR; it now reports a clean diagnostic. small change, but a meaningful one to contribute to.
Much talk recently about @mntruell and @cursor_ai customer service but I'm not seeing much of it. My wife's API key got stolen 2 weeks ago and >$3k of fraudulent charges run up in days. CC flags it as fraudulent. So far customer suppo
the open-source community has always been vital for Krea, and having raw/undistilled models is something we always missed. these are the types of models that let you do proper fine-tuning or post-training, but they are rarely released. ex
I raised my personal fund randomly over a weekend. Texted a handful of mutuals and existing investors, and money was wired within 2 hrs. I didn't even send a deck. They were not interested in any due diligence either. I wouldn't call it
Deepslate Opal has the fastest average time to first audio (TTFA) in the index at 0.44s, scoring 62.1%. GPT-Realtime-1.5 records 0.82s at a 72.0% index score, and Grok Voice Think Fast 1.0 records 1.25s at 75.7%. GPT-Realtime-2 (High) recor
one shot this realtime drawing app with poke @interaction really surprised with how well it works and the design of the site itself, very good stuff https:// drawing-app.intern.poke.site
The VC bet is really about the potential for scale, not the likelihood of it. This is why you see immense failures, laughable-in-retrospect bets by VCs. The logic is simple: to attract power laws, you have to be ok with high variance bets
i don't think the practical concern is that that most customers will start building software in-house, but instead that Anthropic will limit frontier model access, develop products competing with current SaaS, and sell them at-cost (vs. wit
optimal logistics recipe for frontier research squad outputmaxxing is converging on: - 3 days/week together in office - 2 days remote - weekly all hands - 2-3 high quality offsites/yr - min 1 celebration/yr with families invited - 3-4 ad
I know I’m selling an agentic coding product, but I wonder this too sometimes. There are places of *extremely* high leverage for coding agents, but the industry is doing a lot of spraying and praying right now.
I guarantee you are sleeping on small models. Deepseek V4 Flash can do ~80% of the tasks you ask Claude or Codex for. It is 137x cheaper per task than Fable. We need better orchestration.
A few years ago I kept copying text into Visual Studio Code just to borrow GitHub Copilot's autocomplete, then pasting it back where I was actually writing. So I built Cotypist: autocomplete for every Mac app, on-device. Featured on Produ
DeepSeek's Harness team lead Cui Tianyi just posted on social media: his team is new, wildly understaffed, and he's personally interviewing candidates every single day while posting job ads across every platform he can find. Three roles ope
From today’s arXiv: the authors investigate how MLP parameters should be allocated across depth. They find that assigning more parameters to earlier layers improves performance, while the reverse allocation hurts it. Cool work! https:// a
Gatekeeping isn’t the problem my guy, it’s the need to turn everything into a formula. It’s not actually sitting with work and thinking just buying the book helps (most never open these) It’s the want to be viral so you do what works in
we've branded 6+ YC companies now and not one of them found us through outbound. £0 spent. they just arrive.
This doesn’t mean the belief must be false, of course. But consider this. If we were in a pre-CoT world and a “left behind” labs discovered CoT and kept it a secret, would its position still be hopeless? For mistral, DeepMind, cohere yes.
Unfortunately, papers & experiences are just the tickets to the interviews but in frontier labs, what matters most is the solid engineering (for 90% of the researcher) at this stage where RL scaling comes to the environment/data/harness sca
Giannis says never ever let your lawyer, agent, and financial advisor meet “They should never be boys, cool. Because then they can keep one another accountable” “Oh, that guy’s doing X, Y, Z wrong, your lawyer can look at your agent’s con
Tough to see influencers peddling this idea that "glass bottles have more microplastics than plastic water bottles" This "shocking truth" was based on: - a single french study - with a 30 µm detection floor (the term "microplastic" general
I find most “ambitious” people deeply unambitious. There are two types of ambition: The first is “goalmaxxing,” where you pick a goal (e.g. building a company, making money, being an athlete) and try to become the best possible at that thi
every PR will obviously come with 100% coverage of AI app testing, that tries every button in the interface to make sure it works as expected why are the coding apps not making AI testing first class feature, 80% of problems are obvious fo
we can estimate that only around 20k people across the world working on the frontier LLM AI I estimated number of people across companies related to model development. I might be off by some factor but relative ordering should be mostly ri
Multi-Vector Embeddings are Provably More Expressive than Single Vector Embeddings @Raj_Jayaram_ proves that approximating multi-vector similarity with single vectors requires exponentially more dimensions.
PSA for Codex users: Codex 0.142.0 addresses issue with writing large amount of data to disk (TB’s of write SSD / degradation) Upgrade to version 142 or higher to cool down those disk.
Nowadays, agents are crushing leaderboards. But when you ask one painfully normal question: You: “Hi, I'm Jeff. My phone number is 1234567890. I returned a desk lamp and filed a refund request on June 22 at 10:13 PM. Can you check the cu