45.
counterargument to the no pretraining purist route is that silver’s framing assumes the reward fully specifies th…
counterargument to the no pretraining purist route is that silver’s framing assumes the reward fully specifies the task. in any domain where it doesn’t, and that’s most of the economy, the research question is “what is the minimum human dat