22.
Schedule-free spectral optimization for language-model training
Schedule-free spectral optimization matching or beating heavily tuned AdamW across 125M and 772M parameter language models hints at simpler training recipes
1 appearance on the backlist front page in the last 30 days.
Schedule-free spectral optimization matching or beating heavily tuned AdamW across 125M and 772M parameter language models hints at simpler training recipes