@MParakhin on Backlist

61.

We are measuring directionally similar, but even more striking difference: 5.5 is a better base model, but the dr…

We are measuring directionally similar, but even more striking difference: 5.5 is a better base model, but the drastically reduced thinking budget (at the same xhigh) makes it worse for high-complexity tasks, like bug finding. We need to be

by @MParakhin (Mikhail Parakhin) · backlist 2026-05-10 · rubric 78.0