40.
been very interested in pretraining research recently so to get started I reproduced the baseline modded nanogpt …
been very interested in pretraining research recently so to get started I reproduced the baseline modded nanogpt setup and tweaked it to train on a single h100 and reached 3.278 fineweb val loss in 9.37B tokens (~5hrs)