2. PyTorch MPS gets specialized SDPA kernels, up to 16× faster by @Is36E (Isalia20) · backlist 2026-05-08 · rubric 96.0