@QingQ77 on Backlist

1 appearance on the backlist front page in the last 30 days.

63.

Train a small 110M parameter model from scratch using the DeepSeek-V4 architecture, making it easy to experiment … (t.co)

by @QingQ77 (Geek Lite) · backlist 2026-05-07 · rubric 92.0