65.
Just imagine:
Just imagine: you create a model (MinMax M.27) that scores the SAME results as Opus 4.6 on SWE Bench PRO. But when we create a benchmark where your model didn't train, you literally score 0. Because MinMax models are shit, and incomparabl