82.
The next bottleneck in Agentic RL training isn't the model — it's the environment .
The next bottleneck in Agentic RL training isn't the model — it's the environment . The executable, stateful, verifiable world an agent acts in. RL is hungry for these, and benchmarks (a few hundred hand-built tasks) can't feed it. So the