61.
An excerpt from my upcoming paper: Renormalization Group Theory of Learning
An excerpt from my upcoming paper: Renormalization Group Theory of Learning Among other things, the RG approach helps explain what the Muon optimizer is doing. Here, I show a simple experiment (MNIST/MLP3) where AdamW overfits almost imm