[llvm] [X86] Attempt to use VPMADD52L/VPMULUDQ instead of VPMULLQ on slow VPMULLQ targets (or when VPMULLQ is unavailable) (PR #171760)

via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 10 20:56:54 PST 2025


github-actions[bot] wrote:

<!--LLVM CODE FORMAT COMMENT: {clang-format}-->


:warning: C/C++ code formatter, clang-format found issues in your code. :warning:

<details>
<summary>
You can test this locally with the following command:
</summary>

``````````bash
git-clang-format --diff origin/main HEAD --extensions cpp -- llvm/lib/Target/X86/X86ISelLowering.cpp --diff_from_common_commit
``````````

:warning:
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing `origin/main` to the base branch/commit you want to compare against.
:warning:

</details>

<details>
<summary>
View the diff from clang-format here.
</summary>

``````````diff
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 346af7d60..2ad5ee6f6 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -49938,15 +49938,15 @@ static SDValue combineMul(SDNode *N, SelectionDAG &DAG,
     unsigned Count1 = Known1.countMinLeadingZeros();
 
     // Optimization 1: Use VPMULUDQ (32-bit multiply).
-    // If the upper 32 bits are zero, we can use the standard PMULUDQ instruction.
-    // This is generally the fastest option and widely supported.
+    // If the upper 32 bits are zero, we can use the standard PMULUDQ
+    // instruction. This is generally the fastest option and widely supported.
     if (Count0 >= 32 && Count1 >= 32) {
       return DAG.getNode(X86ISD::PMULUDQ, DL, VT, Op0, Op1);
     }
 
     // Optimization 2: Use VPMADD52L (52-bit multiply-add).
-    // On targets with slow VPMULLQ (e.g., Ice Lake), 
-    //VPMADD52L is significantly faster (lower latency/better throughput).
+    // On targets with slow VPMULLQ (e.g., Ice Lake),
+    // VPMADD52L is significantly faster (lower latency/better throughput).
     if (Subtarget.hasAVX512() && Subtarget.hasIFMA()) {
       if (Count0 >= 12 && Count1 >= 12) {
         SDValue Zero = getZeroVector(VT.getSimpleVT(), Subtarget, DAG, DL);

``````````

</details>


https://github.com/llvm/llvm-project/pull/171760


More information about the llvm-commits mailing list