@s_batzoglou on Backlist

32.

I find GLM-5.2 currently unusable for hard reasoning tasks. I gave it 11 induction problems from my benchmark (IC… (t.co)

I find GLM-5.2 currently unusable for hard reasoning tasks. I gave it 11 induction problems from my benchmark (ICML 2026, https:// arxiv.org/abs/2602.18956). - 4 out of the 11 completed, the rest failed; 2 correct - Average time per compl

by @s_batzoglou (Serafim Batzoglou) · backlist 2026-06-20 · rubric 99.5