[llvm] [NVPTX] Add intrinsic support for specialized prmt variants (PR #140951)

Durgadoss R via llvm-commits llvm-commits at lists.llvm.org
Thu May 22 06:15:19 PDT 2025


================
@@ -661,6 +661,126 @@ all bits set to 0 except for %b bits starting at bit position %a. For the
 '``clamp``' variants, the values of %a and %b are clamped to the range [0, 32],
 which in practice is equivalent to using them as is.
 
+'``llvm.nvvm.prmt``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+    declare i32 @llvm.nvvm.prmt(i32 %a, i32 %b, i32 %c)
+
+Overview:
+"""""""""
+
+The '``llvm.nvvm.prmt``' constructs a permutation of the bytes of the first two
+operands, selecting based on the third operand.
+
+Semantics:
+""""""""""
+
+The bytes in the first two source operands are numbered from 0 to 7:
+{%b, %a} = {{b7, b6, b5, b4}, {b3, b2, b1, b0}}. For each byte in the target 
+register, a 4-bit selection value is defined.
+
+The 3 lsbs of the selection value specify which of the 8 source bytes should be
+moved into the target position. The msb defines if the byte value should be
+copied, or if the sign (msb of the byte) should be replicated over all 8 bits
+of the target position (sign extend of the byte value); msb=0 means copy the
+literal value; msb=1 means replicate the sign.
+
+These 4-bit selection values are pulled from the lower 16-bits of the third
+operand, with the least significant selection value corresponding to the least
+significant byte of the destination.
+
+
+'``llvm.nvvm.prmt.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+    declare i32 @llvm.nvvm.prmt.f4e(i32 %a, i32 %b, i32 %c)
+    declare i32 @llvm.nvvm.prmt.b4e(i32 %a, i32 %b, i32 %c)
+
+    declare i32 @llvm.nvvm.prmt.rc8(i32 %a, i32 %c)
+    declare i32 @llvm.nvvm.prmt.ecl(i32 %a, i32 %c)
+    declare i32 @llvm.nvvm.prmt.ecr(i32 %a, i32 %c)
+    declare i32 @llvm.nvvm.prmt.rc16(i32 %a, i32 %c)
+
+Overview:
+"""""""""
+
+The '``llvm.nvvm.prmt.*``' family of intrinsics constructs a permutation of the
+bytes of the first one or two operands, selecting based on the 2 least
+significant bits of the final operand.
+
+Semantics:
+""""""""""
+
+As with the generic '``llvm.nvvm.prmt``' intrinsic, the bytes in the first one
+or two source operands are numbered. The first source operand (%a) is numbered
+{b3, b2, b1, b0}, in the case of the '``f4e``' and '``b4e``' variants, the
+second source operand (%b) is numbered {b7, b6, b5, b4}.
+
+Depending on the 2 least significant bits of the final operand, the result of
+the permutation is defined as follows:
+
++------------+---------+--------------+
+|    Mode    | %c[1:0] |    Output    |
++------------+---------+--------------+
+| '``f4e``'  |   0     | {3, 2, 1, 0} |
+|            +---------+--------------+
+|            |   1     | {4, 3, 2, 1} |
+|            +---------+--------------+
+|            |   2     | {5, 4, 3, 2} |
+|            +---------+--------------+
+|            |   3     | {6, 5, 4, 3} |
++------------+---------+--------------+
+| '``b4e``'  |   0     | {5, 6, 7, 0} |
+|            +---------+--------------+
+|            |   1     | {6, 7, 0, 1} |
+|            +---------+--------------+
+|            |   2     | {7, 0, 1, 2} |
+|            +---------+--------------+
+|            |   3     | {0, 1, 2, 3} |
++------------+---------+--------------+
+| '``rc8``'  |   0     | {0, 0, 0, 0} |
+|            +---------+--------------+
+|            |   1     | {1, 1, 1, 1} |
+|            +---------+--------------+
+|            |   2     | {2, 2, 2, 2} |
+|            +---------+--------------+
+|            |   3     | {3, 3, 3, 3} |
++------------+---------+--------------+
+| '``ecl``'  |   0     | {3, 2, 1, 0} |
+|            +---------+--------------+
+|            |   1     | {3, 2, 1, 1} |
+|            +---------+--------------+
+|            |   2     | {3, 2, 2, 2} |
+|            +---------+--------------+
+|            |   3     | {3, 3, 3, 3} |
++------------+---------+--------------+
+| '``ecr``'  |   0     | {0, 0, 0, 0} |
+|            +---------+--------------+
+|            |   1     | {1, 1, 1, 0} |
+|            +---------+--------------+
+|            |   2     | {2, 2, 1, 0} |
+|            +---------+--------------+
+|            |   3     | {3, 2, 1, 0} |
++------------+---------+--------------+
+| '``rc16``' |   0     | {1, 0, 1, 0} |
+|            +---------+--------------+
+|            |   1     | {3, 2, 3, 2} |
+|            +---------+--------------+
+|            |   2     | {1, 0, 1, 0} |
+|            +---------+--------------+
+|            |   3     | {3, 2, 3, 2} |
++------------+---------+--------------+
----------------
durga4github wrote:

Nice to see the table itself here, Thank you!

https://github.com/llvm/llvm-project/pull/140951


More information about the llvm-commits mailing list