49.
I bet they used BF16-throughput as the denominator when training in FP8 or something. By that algebra, I can get … (t.co)
I bet they used BF16-throughput as the denominator when training in FP8 or something. By that algebra, I can get you 150% MFU in no time. For reference, as far as I know the SOTA Hopper GEMM kernel is ~84% utilization. https:// arxiv.org/a