50.
I agree with Dwarkesh that the million-fold sample efficiency gap is real, but it's measuring the wrong loop
I agree with Dwarkesh that the million-fold sample efficiency gap is real, but it's measuring the wrong loop Pretraining is grotesque because it's gigawatts and a thousand rollouts per task to learn one skill. But it produces a model that