81.
Not at all crazy.
Not at all crazy. 1. For many (most?) important problems IRL, you're collecting data from the wild and cannot sample completions from the base model. Training RMs has had poor-to-mixed success because of distribution shift. If you're worki