2. ProgramBench: 200 isolated whole-repo generation tasks for coding agents by @18jeffreyma (Jeff Ma ICLR’26) · backlist 2026-05-05 · rubric 96.0