@plugyawn on Backlist

17.

Megaprop: preconditioned optimization across GPUs

Megaprop extends Megatron and TransformerEngine with distributed support for Muon, FOOF, KFAC, Newton-Muon, and MuP across width and depth

by @plugyawn (Plugyawn) · backlist 2026-06-15 · rubric 66.0

34.

Megaprop's PSGD implementation calculates preconditioning matrices along with the gradient, collecting and commun…

Megaprop's PSGD implementation calculates preconditioning matrices along with the gradient, collecting and communicating X.T @ X and dY.T @ dY at the same time we do the gradient on the weights: dY.T @ X, and has first-class support for dia

by @plugyawn (Plugyawn) · backlist 2026-06-15 · rubric 84.0