[llvm] [ISel] Introduce llvm.clmul intrinsic (PR #168731)

Thu Nov 27 13:18:38 PST 2025

davemgreen wrote:

Looking at https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3642r2.html#hardware-support and from what I can tell of other architectures, x86 has a pclmulqdq instruction that performs clmul(zext(a), zext(b)), of i64->i128 if I have that correct. There is an integer on the operations that picks between which of the top/bottom halves of the input vector to use but it doesn't change the behaviour.

Arm and AArch64 are the ones I am most familiar with, which has a PMUL that performs v16i8->v16i8 and a PMULL that performs v8i8->v8i16 or i64->i128 multiplies. There are also SVE variants that work the same as far as I can tell, so are too performing clmul(zext, zext).

PowerPC seems to have VPMSUM that performs multiple clmul(zext, zext) and xor them together, if I am reading them correctly. RiscV has the two/three operations that perform the bottom and top halves separately. Having a generic DAG combine that looks to turn clmul(zext, zext) into clmulh sounds like it would not be beneficial in many targets, without a check that it is worthwhile. If you plan to add a RISCVISD::CLMUL could it perform the optimization to produce it directly?

https://github.com/llvm/llvm-project/pull/168731