6.
GameCraft-Bench: can coding agents build playable Godot games?
A new benchmark asks coding agents to ship complete playable Godot projects across 140 tasks, and the best current agent solves only 41.5%
1 appearance on the backlist front page in the last 30 days.
A new benchmark asks coding agents to ship complete playable Godot projects across 140 tasks, and the best current agent solves only 41.5%