[llvm] [LLVM][NVPTX] Add codegen support for tcgen05.{ld, st} instructions (PR #126740)

Thu Feb 13 22:24:57 PST 2025

================
@@ -1175,6 +1175,101 @@ For more information, refer to the PTX ISA
 `<https://docs.nvidia.com/cuda/parallel-thread-execution/#tensorcore-5th-generation-instructions-tcgen05-fence>`_.
 
 
+'``llvm.nvvm.tcgen05.ld.*``'
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+  declare <n x i32> @llvm.nvvm.tcgen05.ld.<shape>.<num>(ptr addrspace(6) %tmem_addr, i1 %pack)
+
+  declare <n x i32> @llvm.nvvm.tcgen05.ld.16x32bx2.<num>(ptr addrspace(6) %tmem_addr, i64 %offset, i1 %pack)
----------------
schwarzschild-radius wrote:

> In order to work, intrinsic return type should've been [llvm_i32_ty, llvm_i32_ty]. Or, alternatively, the pattern would need to change so that it matches the single llvm_v2i32_ty value returned by the instruction. I think you want the former.

Got it. We would prefer the vector inteface over a list of types (struct) because

1. PTX represents the return types as vectors (in the case of tcgen05.ld)
2. In Frontend like MLIR, it will be conveninent for people to use vector types and pass it down to LLVM IR without converting them

What do you think?

https://github.com/llvm/llvm-project/pull/126740