9.
Spectral Lens: looking past loss curves in LLM training (t.co)
Activation and gradient spectra can expose representation geometry, forecast token efficiency early, and separate real learning gains from throughput gains
1 appearance on the backlist front page in the last 30 days.
Activation and gradient spectra can expose representation geometry, forecast token efficiency early, and separate real learning gains from throughput gains