@aryaman2020 on Backlist

48.

yesterday I was debugging a poorly-performing training run with Claude Code and I discovered that instead of trai…

yesterday I was debugging a poorly-performing training run with Claude Code and I discovered that instead of training on 30 batches of data it had somehow decided to train a new model for 500 steps on each batch and then average the 30 sets

by @aryaman2020 (Aryaman Arora) · backlist 2026-06-24 · rubric 97.8

82.

new paper we made serving many different finetunes surprisingly efficient by just… not intervening at decode steps!

by @aryaman2020 (Aryaman Arora) · backlist 2026-05-28 · rubric 74.0