56.
(1/6) New μP paper:
(1/6) New μP paper: Under μP, GQA passes coordinate checks but fails to transfer learning rate! That contradiction isn't just a quirk of GQA. It exposes a silent failure mode in the TPV framework that the original theory simply can't dete