84.
This graph from the NLA paper, imo, provides pretty convincing evidence that activation verbalizers surfaces unve…
This graph from the NLA paper, imo, provides pretty convincing evidence that activation verbalizers surfaces unverbalized eval awareness. It is also crazy. Notice how the verbalized eval awareness dot is offset only when it's significantly