8.
NanoGPT-Bench: coding agents recover 9.3% of human AI R&D progress
On a months-long AI R&D task, Codex, Claude Code, and Autoresearch mostly tuned hyperparameters and recovered only 9.3% of human progress
1 appearance on the backlist front page in the last 30 days.
On a months-long AI R&D task, Codex, Claude Code, and Autoresearch mostly tuned hyperparameters and recovered only 9.3% of human progress