11.
Why Muon applies momentum before orthogonalization (t.co)
Momentum can act as a spectral filter on matrix-valued gradients, making the subsequent orthogonalization step in Muon more reliable
1 appearance on the backlist front page in the last 30 days.
Momentum can act as a spectral filter on matrix-valued gradients, making the subsequent orthogonalization step in Muon more reliable