80. Sometimes evals don’t cover what models are capable of. by @karbon0x (Rohan Seth) · backlist 2026-05-08 · rubric 88.0