[llvm] [X86] Attempt to use VPMADD52L/VPMULUDQ instead of VPMULLQ on slow VPMULLQ targets (or when VPMULLQ is unavailable) (PR #171760)
Simon Pilgrim via llvm-commits
llvm-commits at lists.llvm.org
Sat Dec 13 07:01:02 PST 2025
================
@@ -49926,6 +49873,39 @@ static SDValue combineMul(SDNode *N, SelectionDAG &DAG,
if (SDValue V = combineMulToPMULDQ(N, DL, DAG, Subtarget))
return V;
+ // ==============================================================
+ // Optimize VPMULLQ on slow targets
+ // ==============================================================
+ if (VT.getScalarType() == MVT::i64 && Subtarget.hasSlowPMULLQ()) {
+ SDValue Op0 = N->getOperand(0);
+ SDValue Op1 = N->getOperand(1);
+
+ KnownBits Known0 = DAG.computeKnownBits(Op0);
+ KnownBits Known1 = DAG.computeKnownBits(Op1);
+ unsigned Count0 = Known0.countMinLeadingZeros();
+ unsigned Count1 = Known1.countMinLeadingZeros();
+
+ // Optimization 1: Use VPMULUDQ (32-bit multiply).
+ // If the upper 32 bits are zero, we can use the standard PMULUDQ
+ // instruction. This is generally the fastest option and widely supported.
+ if (Count0 >= 32 && Count1 >= 32) {
+ return DAG.getNode(X86ISD::PMULUDQ, DL, VT, Op0, Op1);
+ }
----------------
RKSimon wrote:
(style) unnecessary braces
https://github.com/llvm/llvm-project/pull/171760
More information about the llvm-commits
mailing list