77.
A sparse autoencoder aims to learn a dictionary of interpretable features from a model's activations — but a lot …
A sparse autoencoder aims to learn a dictionary of interpretable features from a model's activations — but a lot can come out "dead," never firing once. On some models this is rare; on others, >70% die even with fixes like AuxK. We went dow