67.
Claude Opus 4.8 scores 58% on DeepSWE Bench, Pass@1
Claude Opus 4.8 scores 58% on DeepSWE Bench, Pass@1 #2 overall behind GPT-5.5. Keep in mind GPT 5.6 comes out in June along with Mythos later in the month!
3 appearances on the backlist front page in the last 30 days.
Claude Opus 4.8 scores 58% on DeepSWE Bench, Pass@1 #2 overall behind GPT-5.5. Keep in mind GPT 5.6 comes out in June along with Mythos later in the month!
Next up, Anthropic on SWE Bench Pro. This is where we see some bigger jumps rather than incremental. Opus 4.6 scored 53.4%, Opus 4.7 hit 64.3%, and Mythos Preview jumped to 77.8%. • Opus 4.6: 53.4% • Opus 4.7: 64.3% • Mythos Preview: 77.8
The GPT 5 series progress over the past 10 months with SWE PRO (public) ! • GPT 5.1: 50.8% • GPT 5.2: 55.6% • GPT 5.3 Codex: 56.8% • GPT 5.4: 57.7% • GPT 5.5: 58.6% From 50.8% to 58.6% across the GPT 5 series is slow, steady, and very rea