3.
ExploitGym: measuring whether AI agents can turn CVEs into working exploits
The benchmark tests autonomous exploitation on complex real targets, moving AI cyber-risk discussion from hypotheticals to measured attack capability
1 appearance on the backlist front page in the last 30 days.
The benchmark tests autonomous exploitation on complex real targets, moving AI cyber-risk discussion from hypotheticals to measured attack capability