[llvm] r305414 - [x86] avoid unnecessary shuffle mask math in combineX86ShufflesRecursively()

Sanjay Patel via llvm-commits llvm-commits at lists.llvm.org
Wed Jun 14 13:37:12 PDT 2017


Author: spatel
Date: Wed Jun 14 15:37:11 2017
New Revision: 305414

URL: http://llvm.org/viewvc/llvm-project?rev=305414&view=rev
Log:
[x86] avoid unnecessary shuffle mask math in combineX86ShufflesRecursively()

This is a follow-up to https://reviews.llvm.org/D34174 / https://reviews.llvm.org/rL305398.

We mentioned replacing the multiplies with shifts, but the real win seems to be in
bypassing the extra ops in the common case when the RootRatio and OpRatio are one.

This gives us another 1-2% overall win for the test in PR32037:
https://bugs.llvm.org/show_bug.cgi?id=32037

Modified:
    llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=305414&r1=305413&r2=305414&view=diff
==============================================================================
--- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
+++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Wed Jun 14 15:37:11 2017
@@ -28005,10 +28005,10 @@ static bool combineX86ShufflesRecursivel
       continue;
     }
 
-    // TODO: Here and below, we could convert multiply to shift-left for
-    // performance because we know that our mask sizes are power-of-2.
     unsigned RootMaskedIdx =
-        RootMask[RootIdx] * RootRatio + (i & (RootRatio - 1));
+        RootRatio == 1
+            ? RootMask[RootIdx]
+            : (RootMask[RootIdx] << RootRatioLog2) + (i & (RootRatio - 1));
 
     // Just insert the scaled root mask value if it references an input other
     // than the SrcOp we're currently inserting.
@@ -28019,7 +28019,6 @@ static bool combineX86ShufflesRecursivel
     }
 
     RootMaskedIdx = RootMaskedIdx & (MaskWidth - 1);
-
     unsigned OpIdx = RootMaskedIdx >> OpRatioLog2;
     if (OpMask[OpIdx] < 0) {
       // The incoming lanes are zero or undef, it doesn't matter which ones we
@@ -28030,9 +28029,11 @@ static bool combineX86ShufflesRecursivel
 
     // Ok, we have non-zero lanes, map them through to one of the Op's inputs.
     unsigned OpMaskedIdx =
-        OpMask[OpIdx] * OpRatio + (RootMaskedIdx & (OpRatio - 1));
-    OpMaskedIdx = OpMaskedIdx & (MaskWidth - 1);
+        OpRatio == 1
+            ? OpMask[OpIdx]
+            : (OpMask[OpIdx] << OpRatioLog2) + (RootMaskedIdx & (OpRatio - 1));
 
+    OpMaskedIdx = OpMaskedIdx & (MaskWidth - 1);
     if (OpMask[OpIdx] < (int)OpMask.size()) {
       assert(0 <= InputIdx0 && "Unknown target shuffle input");
       OpMaskedIdx += InputIdx0 * MaskWidth;




More information about the llvm-commits mailing list