6.
Transformers Still Hit Plasticity Loss
The paper says long training eventually erodes a model’s ability to adapt, even in stationary pretraining setups
2 appearances on the backlist front page in the last 30 days.
The paper says long training eventually erodes a model’s ability to adapt, even in stationary pretraining setups
We find that plasticity loss happens both in nonstationary continual-learning environments but also in stationary pretraining-like setups. This means that if you pretrain long enough your LLM will eventually lose the ability to adapt to new