48.
yesterday I was debugging a poorly-performing training run with Claude Code and I discovered that instead of trai…
yesterday I was debugging a poorly-performing training run with Claude Code and I discovered that instead of training on 30 batches of data it had somehow decided to train a new model for 500 steps on each batch and then average the 30 sets