Are AI agents shape rotators? In this new benchmark, we let the models play campaign puzzles in Opus Magnum, a pu… (x.com)
Are AI agents shape rotators? In this new benchmark, we let the models play campaign puzzles in Opus Magnum, a puzzle game by @zachtronics . Ironically, Claude Opus 4.8 performed poorly, being beaten by GPT-5.5, Gemini 3.5 Flash, and GLM