@rishi_desai2 on Backlist

3 appearances on the backlist front page in the last 30 days.

33.

SWE-Marathon exposes whether agents actually solve the task, or start searching for exploits in the verifier/envi…

SWE-Marathon exposes whether agents actually solve the task, or start searching for exploits in the verifier/environment. Across 100 GLM 5.2 rollouts, we saw only 3% shortcut-seeking behavior and no shipped exploit code.

by @rishi_desai2 (Rishi Desai) · backlist 2026-06-16 · rubric 85.0

41.

GLM 5.2 also solved ruby-rust-port, a task no other agent including Claude Fable 5 has solved before.

GLM 5.2 also solved ruby-rust-port, a task no other agent including Claude Fable 5 has solved before. It also sustained a 350M+ token rollout on nextjs-vite-rewrite.

by @rishi_desai2 (Rishi Desai) · backlist 2026-06-16 · rubric 82.0

77.

GLM 5.2 is now #3 on SWE-Marathon, ahead of GPT-5.5, Gemini 3.5 Flash, and DeepSeek V4 Pro.

GLM 5.2 is now #3 on SWE-Marathon, ahead of GPT-5.5, Gemini 3.5 Flash, and DeepSeek V4 Pro. The standout result: GLM 5.2 is remarkably reward-hack resistant.

by @rishi_desai2 (Rishi Desai) · backlist 2026-06-16 · rubric 72.0