48.
as much as we have fun building evals + environments, at some point, poor grad students (among others) become the…
as much as we have fun building evals + environments, at some point, poor grad students (among others) become the bottleneck to improving AI system capabilities. there's a ton of domains that are technically verifiable (but not in ways that