83.
a model experiences many RL/eval scenarios before the weights are frozen and it is deployed; once deployed, it on…
a model experiences many RL/eval scenarios before the weights are frozen and it is deployed; once deployed, it only experiences reality for the duration of each individual session. 99% of its experience is eval. so by anthropic reasoning, i