89.
They use fixed-point residual as a halting signal itself unlike previous papers. I think it's close to EqR in spi…
They use fixed-point residual as a halting signal itself unlike previous papers. I think it's close to EqR in spirit of landscape/attractor shaping as it modified training with pre-norm, residual scaling and damping. Other papers focus on t