@tessera_antra on Backlist

43.

Using off-policy (rollouts of another model) prefixes gives the game away - the model would learn to classify off…

Using off-policy (rollouts of another model) prefixes gives the game away - the model would learn to classify off- vs on- policy even better than they do already. You would get higher eval awareness, not lower, even though it would be bette

by @tessera_antra (antra) · backlist 2026-06-17 · rubric 83.0