40.
Why does deep learning generalize? What does weight decay really do? Can algorithmic information theory address t…
Why does deep learning generalize? What does weight decay really do? Can algorithmic information theory address these questions? In my latest preprint, I give a proof that the minimum neural weight norm matches the minimum program length (