@kimin_le2 on Backlist

41.

On-policy Distillation (OPD) can suffer from mode-seeking behavior due to the reverse KL objective. In our recent… (x.com)

On-policy Distillation (OPD) can suffer from mode-seeking behavior due to the reverse KL objective. In our recent work, we address this by augmenting OPD with a forward KL term. Please check out @wg_jin02 's post for more details!

by @kimin_le2 (Kimin) · backlist 2026-05-26 · rubric 94.0