[llvm] [X86] Attempt to use VPMADD52L/VPMULUDQ instead of VPMULLQ on slow VPMULLQ targets (or when VPMULLQ is unavailable) (PR #171760)

Simon Pilgrim via llvm-commits llvm-commits at lists.llvm.org
Mon Dec 15 07:46:58 PST 2025


================
@@ -49926,6 +49873,40 @@ static SDValue combineMul(SDNode *N, SelectionDAG &DAG,
   if (SDValue V = combineMulToPMULDQ(N, DL, DAG, Subtarget))
     return V;
 
+  if (VT.getScalarType() == MVT::i64 && Subtarget.isPMULLQSlow()) {
+    SDValue Op0 = N->getOperand(0);
+    SDValue Op1 = N->getOperand(1);
+
+    KnownBits Known0 = DAG.computeKnownBits(Op0);
+    KnownBits Known1 = DAG.computeKnownBits(Op1);
+    unsigned Count0 = Known0.countMinLeadingZeros();
+    unsigned Count1 = Known1.countMinLeadingZeros();
+
+    // Optimization 1: Use VPMULUDQ (32-bit multiply).
+    if (Count0 >= 32 && Count1 >= 32)
+      return DAG.getNode(X86ISD::PMULUDQ, DL, VT, Op0, Op1);
+
+    // Optimization 1.5: Use PMULDQ (32-bit signed multiply).
+    unsigned Sign0 = DAG.ComputeNumSignBits(Op0);
+    unsigned Sign1 = DAG.ComputeNumSignBits(Op1);
+    if (Sign0 > 32 && Sign1 > 32)
+      return DAG.getNode(X86ISD::PMULDQ, DL, VT, Op0, Op1);
+
+    // Optimization 2: Use VPMADD52L (52-bit multiply-add).
+    if (Subtarget.hasIFMA() || Subtarget.hasAVXIFMA()) {
+      if (VT.getSizeInBits() == 512 || Subtarget.hasVLX() ||
+          Subtarget.hasAVXIFMA()) {
----------------
RKSimon wrote:

What happens if there is just hasAVXIFMA and VT.getSizeInBits() ? That should only support vsi64/v4i64

https://github.com/llvm/llvm-project/pull/171760


More information about the llvm-commits mailing list