[llvm] [NVPTX] Add TMA bulk tensor reduction intrinsics (PR #116854)
Artem Belevich via llvm-commits
llvm-commits at lists.llvm.org
Fri Nov 22 11:21:49 PST 2024
================
@@ -663,6 +663,84 @@ the same functionality as described in the ``tile`` mode intrinsics above.
For more information, refer PTX ISA
`<https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async-bulk-prefetch-tensor>`_.
+'``llvm.nvvm.cp.async.bulk.tensor.reduce.tile.[1-5]d``'
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+ declare void @llvm.nvvm.cp.async.bulk.tensor.reduce.tile.1d(ptr addrspace(3) %src, ptr %tensor_map, i32 %d0, i64 %ch, i1 %flag_ch, i8 %flag_red_op)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.reduce.tile.2d(..., i32 %d0, i32 %d1, ...)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.reduce.tile.3d(..., i32 %d0, i32 %d1, i32 %d2, ...)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.reduce.tile.4d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, ...)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.reduce.tile.5d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, i32 %d4, ...)
+
+Overview:
+"""""""""
+
+The '``@llvm.nvvm.cp.async.bulk.tensor.reduce.tile.[1-5]d``' intrinsics
+correspond to the ``cp.reduce.async.bulk.tensor.[1-5]d.*`` set of PTX instructions.
+These instructions initiate an asynchronous reduction operation of tensor data
+in global memory with the tensor data in shared{::cta} memory, using ``tile`` mode.
+The dimension of the tensor data ranges from 1d to 5d with the coordinates
+specified by the ``i32 %d0 ... i32 %d4`` arguments.
+
+* The last two arguments to these intrinsics are flags.
+ These flag arguments must be compile-time constants. The backend
+ looks through these flags and lowers the intrinsics appropriately.
+
+* The Nth argument (denoted by ``i8 flag_red_op``) indicates the
+ kind of reduction operation performed. The argument must be in
+ the range [0, 7], representing the following reduction operations:
+
+ ========== =============
+ Enum Value Reduction Op
+ ========== =============
+ ``0`` ADD
+ ``1`` MIN
+ ``2`` MAX
+ ``3`` INC
+ ``4`` DEC
+ ``5`` AND
+ ``6`` OR
+ ``7`` XOR
+ ========== =============
----------------
Artem-B wrote:
This particular argument should probably be part of the intrinsic name, as it describes *what* we want to do, not the parameters of the particular operation.
https://github.com/llvm/llvm-project/pull/116854
More information about the llvm-commits
mailing list