66.
Why does MTP acceptance length dropin RL? Not policy mismatch, Just higher entropy. (t.co)
Why does MTP acceptance length dropin RL? Not policy mismatch, Just higher entropy. Rejection sampling + e2e TV loss → entropy-free You can found the secert in https:// arxiv.org/abs/2606.12370. We use it in Qwen3.5-3.7, upto 95% MTP acc