GLM 5.2 (max) scores 70.1% on WeirdML, narrowly beating to Genini 3 Pro, from 7 months ago.
GLM 5.2 (max) scores 70.1% on WeirdML, narrowly beating to Genini 3 Pro, from 7 months ago. It uses ~22k output tokens on average, compared to ~12k for the (high) setting. This gives a fairly clear but modest increase (3%) in score, showi