@mohitban47 on Backlist

54.

Outcome rewards in LLM RL are sparse --> AVSD (Adaptive-View Self-Distillation) turns privileged info into dense …

Outcome rewards in LLM RL are sparse --> AVSD (Adaptive-View Self-Distillation) turns privileged info into dense token-level supervision, and instead of relying on only one privileged view, it combines multiple views and balances stable cr

by @mohitban47 (Mohit Bansal) · backlist 2026-05-22 · rubric 90.0