[PATCH] D38645: [NVPTX] Implemented wmma intrinsics and instructions.

Wed Oct 11 11:27:56 PDT 2017

YuanLin added a comment.

Artem,  thanks a lot for working on this!

I notice that you are taking a different approach to define the llvm wmma intrinsics than what we (NVIDIA) do.

  http://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#nvvm-intrin-warp-level-matrix

Specifically, yours embeds the layout/memory space/ while ours treats them as constant arguments. We did this to reduce the amount of intrinsic functions the optimizer and codegen have to deal with. We have plans for more wmma features in the next few CUDA releases. It would be better to unify the syntax and naming of the wmma intrinsics. It would also make cross-support much easier.

Would you be able to revise the patch? Highly appreciated.

Thanks.

Yuan

https://reviews.llvm.org/D38645