50.
Yeah, I think this is a fair concern.
Yeah, I think this is a fair concern. One practical issue is cost: a single 24h Codex run already consumes around 100M tokens, so extending this to the full two-week human window across multiple tasks/trials would quickly reach the 10B-tok