51.
Next-token prediction is myopic. What if transformers learn to predict their own next latent state?
Next-token prediction is myopic. What if transformers learn to predict their own next latent state? We present ๐ก๐ฒ๐ ๐-๐๐ฎ๐๐ฒ๐ป๐ ๐ฃ๐ฟ๐ฒ๐ฑ๐ถ๐ฐ๐๐ถ๐ผ๐ป (๐ก๐ฒ๐ ๐๐๐ฎ๐): a self-supervised learning method that teaches transformers to for