46.
Things that DeepSWE does well on (for long horizon benchs out there):
Things that DeepSWE does well on (for long horizon benchs out there): 0.3% false-positive vs SWE-Bench Pro's 8.5%, with an independent LLM-analyzer audit on every trial pretty good contamination resistance as seen from canary GUID, fairly