32. Analysis of why the optimizer for pretraining and finetuning should be the same, and characteristics of memorizat… by @rosinality (Rosinality) · backlist 2026-05-08 · rubric 94.0
79. One more study on the scaling law under data repetition. (And whether higher weight decay could be beneficial for… by @rosinality (Rosinality) · backlist 2026-05-05 · rubric 84.0