[llvm] [ISel] Introduce llvm.clmul intrinsic (PR #168731)

Thu Nov 27 16:13:02 PST 2025

artagnon wrote:

This is a very interesting discussion, thanks! I had the same doubt when investigating CLMUL on other architectures, and wondered if it would be better to introduce a widening version of CLMUL that consumes iX types to i2X types. I gave it some thought, and I think that the convention for instructions and intrinsics in LLVM is to not do that and always return iX types when consuming iX types, creating separating ISDs to handle overflows (I was looking at MUL and MULH[US]). Yes, these targets would need to pattern-match clmul(zext, zext) for custom-lowering to PMULL, but what I think we're missing in this discussion is that CLMULH is `clmul(zext, zext) >> BW`, ie. just the high bits. A clmul(zext, zext) would be left as-is by our patch, and targets that have a widening clmul can pattern match it and generate PMULL. Note that CLMUL[RH] matched by DAGCombiner are of two forms: clmul(bitreverse, bitreverse) [>> 1] and clmul(zext, zext) >> BW [- 1], and the generic lowering tries to convert the first form to the second, resulting in more efficient lowering; the second form matched by DAGCombiner has an exactly equivalent lowering in the generic case (it's like the DAGCombiner pattern never existed). This has two benefits on widening-clmul targets: they don't need to pattern-match clmul(bitreverse, bitreverse) >> 1 (equivalent to clmul(zext, zext) >> BW), and they don't need to have to have a CLMULH custom-lowering -- they can simply have a CLMUL custom-lowering, and if the user writes a CLMULH in IR (see test cases), it will automatically be turned into the most efficient implementation on that target! So, in the specific case of PMULL, we would see a clmul(zext, zext) >> BW on the target (expansion of CLMULH by the generic lowering), pattern-match clmul(zext, zext) to PMULL in the usual way, and end up generating PMULL >> BW on the target.

For clarification, RISCVISD::CLMUL[RH] already exists, and we're trying to introduce a llvm.clmul with generic lowering, with the intent that all targets can custom-lower. Sorry if I gave the impression that I'm trying to do something RISC-V specific in the generic lowering: it just so happens that the RISC-V instructions match up with LLVM IR intrinsics convention, and I've used the same names. Kindly let me know if you find any holes in my reasoning/design.

https://github.com/llvm/llvm-project/pull/168731