https://github.com/egebeysel approved this pull request. Overall, LGTM! That's probably something for a separate PR, but can we also add `i8mm` and `bf16` versions of this? I think we should be able to support that at the moment. https://github.com/llvm/llvm-project/pull/157815