2.
Apple quietly added TensorOps for quantized matmul and FlashAttention-style fusion to Metal
Metal tensors, TensorOps, and Core AI custom ops expose lower-level primitives for running modern AI workloads efficiently on Apple hardware
2 appearances on the backlist front page in the last 30 days.
Metal tensors, TensorOps, and Core AI custom ops expose lower-level primitives for running modern AI workloads efficiently on Apple hardware
A small project for learning MLIR-style dialects shows how tracing IR and emitting MSL can make GPU compiler internals more approachable