39.
From IcePop to KPop — our team keeps pushing on RL training stability for large MoE models.
From IcePop to KPop — our team keeps pushing on RL training stability for large MoE models. KPop replaces the fixed-ratio mask with an adaptive binary-KL region that matches each token's inherent noise. More robust updates, stable long-ho