41.
Modded-NanoGPT optimization result #12: Transferring good hparams from recent NorMuon records -- in particular, t… (x.com)
Modded-NanoGPT optimization result #12: Transferring good hparams from recent NorMuon records -- in particular, taking final val 25 steps early following @wen_kaiyue 's NorMuonH, and lr=0.035 following Liming Liu's NorMuon -- improved the