@manoelribeiro on Backlist

SciConBench: frontier agents struggle to synthesize scientific conclusions (x.com)

A 9.11k-question benchmark from Cochrane systematic reviews tests whether AI agents can synthesize scientific evidence rather than merely retrieve facts

by @manoelribeiro (Manoel) · backlist 2026-06-11 · rubric 91.0