18.
On-policy distillation with positive-pressure tokens
Using only tokens where the teacher assigns higher probability than the student can still minimize an upper bound on on-policy distillation loss
2 appearances on the backlist front page in the last 30 days.
Using only tokens where the teacher assigns higher probability than the student can still minimize an upper bound on on-policy distillation loss
The quest for reliable on-policy self-distillation continues. Hope something would stand the test of time.