59.
we studied some suspected effects of subword tokenization on language model training, and found which of them act…
we studied some suspected effects of subword tokenization on language model training, and found which of them actually mattered. this also led us to try to amplify them, resulting in the Token Superposition work we previously shared