@Tiberiu_Musat_ on Backlist

40.

Why does deep learning generalize? What does weight decay really do? Can algorithmic information theory address t…

Why does deep learning generalize? What does weight decay really do? Can algorithmic information theory address these questions? In my latest preprint, I give a proof that the minimum neural weight norm matches the minimum program length (

by @Tiberiu_Musat_ (Tiberiu Mușat) · backlist 2026-05-27 · rubric 92.0