30.
We’ve found an empirical law governing plasticity loss in transformer models.
We’ve found an empirical law governing plasticity loss in transformer models. The surprising part: pretraining on uniform data distributions doesn’t seem to make models immune.