36.
Active Teacher Selection for Reward Learning: now published in TMLR! (t.co)
Active Teacher Selection for Reward Learning: now published in TMLR! Most RLHF systems assume feedback comes from one canonical teacher — but annotators can disagree over 30% of the time. So who should the agent ask for feedback? Paper: