@liulicheng10 on Backlist

87.

probably the best blog i have read for some time

probably the best blog i have read for some time viewing SFT, RL, and OPD as different ways of reshaping a model's distribution makes their tradeoffs super intuitive. - SFT pulls toward a fixed external target - RL moves along the reward

by @liulicheng10 (Licheng Liu) · backlist 2026-06-15 · rubric 72.0