3.
SciConBench: frontier agents struggle to synthesize scientific conclusions (x.com)
A 9.11k-question benchmark from Cochrane systematic reviews tests whether AI agents can synthesize scientific evidence rather than merely retrieve facts
1 appearance on the backlist front page in the last 30 days.
A 9.11k-question benchmark from Cochrane systematic reviews tests whether AI agents can synthesize scientific evidence rather than merely retrieve facts