@AntLingAGI on Backlist

39.

From IcePop to KPop — our team keeps pushing on RL training stability for large MoE models.

From IcePop to KPop — our team keeps pushing on RL training stability for large MoE models. KPop replaces the fixed-ratio mask with an adaptive binary-KL region that matches each token's inherent noise. More robust updates, stable long-ho

by @AntLingAGI (Ant Ling) · backlist 2026-05-26 · rubric 94.0