@xidulu on Backlist

2 appearances on the backlist front page in the last 30 days.

53.

(x.com)

1/ Following our previous MoE paper w/ @hayou_soufiane ( https:// arxiv.org/abs/2604.09780), we confirmed that scaling the residual stream: h^{\ell+1} = h^{\ell} + alpha \Delta^\ell improves MoE load balancing at initialization by reduci

by (Xidulu) · backlist 2026-05-12 · rubric 86.0