[llvm] [LLVM][NVPTX] Add codegen support for tcgen05.{ld, st} instructions (PR #126740)

Durgadoss R via llvm-commits llvm-commits at lists.llvm.org
Wed Feb 12 04:15:34 PST 2025


================
@@ -1175,6 +1175,101 @@ For more information, refer to the PTX ISA
 `<https://docs.nvidia.com/cuda/parallel-thread-execution/#tensorcore-5th-generation-instructions-tcgen05-fence>`_.
 
 
+'``llvm.nvvm.tcgen05.ld.*``'
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+  declare <n x i32> @llvm.nvvm.tcgen05.ld.<shape>.<num>(ptr addrspace(6) %tmem_addr, i1 %pack)
+
+  declare <n x i32> @llvm.nvvm.tcgen05.ld.16x32bx2.<num>(ptr addrspace(6) %tmem_addr, i64 %offset, i1 %pack)
----------------
durga4github wrote:

> Thanks for the review! :)
> 
> > Pack is an odd one out here. Is there a particular reason to make it a parameter rather than the name?
> 
> The main reason is to reduce the number of intrinsics handled during lowering by half. Currently, we have 37 intrinsics for tcgen05.ld alone and making `pack` as an intrinsic modifier would increase it to 74 and same goes with tcgen05.st. We did not make other argument `shape` and `num` into a paramter because some of the `shape` and `num` combinations were invalid which we felt would be confusing to the user.
> 
> > We should summarize and document the rules of thumb we've established so far, so we do not have to bikeshed over naming on every new PTX intrinsic.
> 
> @durga4github and I will submit a follow up patch on the rules of thumb for designing new intrinsics for the NVPTX backend in the NVPTXUsage guide

Yes, we intend to do this. I have one more simple tcgen05 item that I am finishing up and will submit the Doc updates after that.

https://github.com/llvm/llvm-project/pull/126740


More information about the llvm-commits mailing list