88.
Progress in coding agents has largely been driven by progress in evals. I still remember when Devin was the first…
Progress in coding agents has largely been driven by progress in evals. I still remember when Devin was the first to reach 13% on SWE-Bench in 2024, and with just two short years of RL, SWE-Bench scores are 75%+. Its uncanny that 13% is al