This was a fascinating project - turns out that LLMs inherit a lot of traits from LLMs they're distilled from, in…
This was a fascinating project - turns out that LLMs inherit a lot of traits from LLMs they're distilled from, including in subtle ways without clear semantic meaning. This has pretty interesting implications - safety problems in a model in