8.
WorldBench: 2,000 visual questions where frontier VLMs still score 64% (x.com)
Carefully verified, visually diverse questions show that real-world visual understanding remains far from saturated for leading multimodal models
1 appearance on the backlist front page in the last 30 days.
Carefully verified, visually diverse questions show that real-world visual understanding remains far from saturated for leading multimodal models