[llvm] [ISel] Introduce llvm.clmul[rh] intrinsics (PR #168731)

Wed Nov 19 12:57:14 PST 2025

================
@@ -18389,6 +18387,153 @@ Example:
       %r = call i8 @llvm.fshr.i8(i8 15, i8 15, i8 11)  ; %r = i8: 225 (0b11100001)
       %r = call i8 @llvm.fshr.i8(i8 0, i8 255, i8 8)   ; %r = i8: 255 (0b11111111)
 
+.. _int_clmul:
+
+'``llvm.clmul.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+This is an overloaded intrinsic. You can use ``llvm.clmul`` on any integer
+or vectors of integer elements.
+
+::
+
+      declare i16 @llvm.clmul.i16(i16 %a, i16 %b)
+      declare i32 @llvm.clmul.i32(i32 %a, i32 %b)
+      declare i64 @llvm.clmul.i64(i64 %a, i64 %b)
+      declare <4 x i32> @llvm.clmul.v4i32(<4 x i32> %a, <4 x i32> %b)
+
+Overview:
+"""""""""
+
+The '``llvm.clmul``' family of intrinsic functions performs carry-less
+multiplication, or XOR multiplication, on the two arguments, and returns
+the low-bits.
+
+Arguments:
+""""""""""
+
+The arguments may be any integer type or vector of integer type. Both arguments
+and result must have the same type.
+
+Semantics:
+""""""""""
+
+The '``llvm.clmul``' intrinsic computes carry-less multiply of its arguments,
+which is the result of applying the standard Eucledian multiplication algorithm,
+where all of the additions are replaced with XORs, and returns the low-bits.
+The vector variants operate lane-wise.
+
+Example:
+""""""""
+
+.. code-block:: llvm
+
+      %r = call i4 @llvm.clmul.i4(i4 1, i4 2)    ; %r = 2
+      %r = call i4 @llvm.clmul.i4(i4 5, i4 6)    ; %r = 14
+      %r = call i4 @llvm.clmul.i4(i4 -4, i4 2)   ; %r = -8
+      %r = call i4 @llvm.clmul.i4(i4 -4, i4 -5)  ; %r = 4
+
+'``llvm.clmulr.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+This is an overloaded intrinsic. You can use ``llvm.clmulr`` on any integer
+or vectors of integer elements.
+
+::
+
+      declare i16 @llvm.clmulr.i16(i16 %a, i16 %b)
+      declare i32 @llvm.clmulr.i32(i32 %a, i32 %b)
+      declare i64 @llvm.clmulr.i64(i64 %a, i64 %b)
+      declare <4 x i32> @llvm.clmulr.v4i32(<4 x i32> %a, <4 x i32> %b)
+
+Overview:
+"""""""""
+
+The '``llvm.clmulr``' family of intrinsic functions performs reversed
+carry-less multiplication on the two arguments.
+
+Arguments:
+""""""""""
+
+The arguments may be any integer type or vector of integer type. Both arguments
+and result must have the same type.
+
+Semantics:
+""""""""""
+
+The '``llvm.clmulr``' intrinsic computes reversed carry-less multiply of its
+arguments. The vector variants operate lane-wise.
+
+.. code-block:: text
+
+      clmulr(%a, %b) = bitreverse(clmul(bitreverse(%a), bitreverse(%b)))
----------------
s-barannikov wrote:

> Doesn't clmulr open up a new suite of bit-based optimizations in the middle-end though?

I don't know, that was my question as well

https://github.com/llvm/llvm-project/pull/168731