[PATCH] D34174: [x86] replace div/rem with shift/mask for shuffle combining
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 13 14:49:20 PDT 2017
spatel created this revision.
Herald added a subscriber: mcrosier.
We know that shuffle masks are power-of-2 sizes, but there's no way (?) for LLVM to know that, so hack combineX86ShufflesRecursively() to be much faster by replacing div/rem with shift/mask.
This makes the motivating compile-time bug in PR32037 ( https://bugs.llvm.org/show_bug.cgi?id=32037 ) about 9% faster overall.
I didn't bother with turning the multiplies into shifts, but we could check the perf from that transform and do that too if this isn't going too far already. :)
https://reviews.llvm.org/D34174
Files:
lib/Target/X86/X86ISelLowering.cpp
Index: lib/Target/X86/X86ISelLowering.cpp
===================================================================
--- lib/Target/X86/X86ISelLowering.cpp
+++ lib/Target/X86/X86ISelLowering.cpp
@@ -27970,28 +27970,43 @@
OpMask.size() % RootMask.size() == 0) ||
OpMask.size() == RootMask.size()) &&
"The smaller number of elements must divide the larger.");
- int MaskWidth = std::max<int>(OpMask.size(), RootMask.size());
- int RootRatio = std::max<int>(1, OpMask.size() / RootMask.size());
- int OpRatio = std::max<int>(1, RootMask.size() / OpMask.size());
- assert(((RootRatio == 1 && OpRatio == 1) ||
- (RootRatio == 1) != (OpRatio == 1)) &&
+
+ // This function can be performance-critical, so we rely on the power-of-2
+ // knowledge that we have about the mask sizes to replace div/rem ops with
+ // bit-masks and shifts.
+ assert(isPowerOf2_32(RootMask.size()) && "Non-power-of-2 shuffle mask sizes");
+ assert(isPowerOf2_32(OpMask.size()) && "Non-power-of-2 shuffle mask sizes");
+ unsigned RootMaskSizeLog2 = countTrailingZeros(RootMask.size());
+ unsigned OpMaskSizeLog2 = countTrailingZeros(OpMask.size());
+
+ unsigned MaskWidth = std::max<unsigned>(OpMask.size(), RootMask.size());
+ unsigned RootRatio = std::max<unsigned>(1, OpMask.size() >> RootMaskSizeLog2);
+ unsigned OpRatio = std::max<unsigned>(1, RootMask.size() >> OpMaskSizeLog2);
+ assert((RootRatio == 1 || OpRatio == 1) &&
"Must not have a ratio for both incoming and op masks!");
- SmallVector<int, 64> Mask((unsigned)MaskWidth, SM_SentinelUndef);
+ assert(isPowerOf2_32(MaskWidth) && "Non-power-of-2 shuffle mask sizes");
+ assert(isPowerOf2_32(RootRatio) && "Non-power-of-2 shuffle mask sizes");
+ assert(isPowerOf2_32(OpRatio) && "Non-power-of-2 shuffle mask sizes");
+ unsigned RootRatioLog2 = countTrailingZeros(RootRatio);
+ unsigned OpRatioLog2 = countTrailingZeros(OpRatio);
+
+ SmallVector<int, 64> Mask(MaskWidth, SM_SentinelUndef);
// Merge this shuffle operation's mask into our accumulated mask. Note that
// this shuffle's mask will be the first applied to the input, followed by the
// root mask to get us all the way to the root value arrangement. The reason
// for this order is that we are recursing up the operation chain.
- for (int i = 0; i < MaskWidth; ++i) {
- int RootIdx = i / RootRatio;
+ for (unsigned i = 0; i < MaskWidth; ++i) {
+ unsigned RootIdx = i >> RootRatioLog2;
if (RootMask[RootIdx] < 0) {
// This is a zero or undef lane, we're done.
Mask[i] = RootMask[RootIdx];
continue;
}
- int RootMaskedIdx = RootMask[RootIdx] * RootRatio + i % RootRatio;
+ unsigned RootMaskedIdx =
+ RootMask[RootIdx] * RootRatio + (i & (RootRatio - 1));
// Just insert the scaled root mask value if it references an input other
// than the SrcOp we're currently inserting.
@@ -28001,19 +28016,20 @@
continue;
}
- RootMaskedIdx %= MaskWidth;
+ RootMaskedIdx = RootMaskedIdx & (MaskWidth - 1);
- int OpIdx = RootMaskedIdx / OpRatio;
+ unsigned OpIdx = RootMaskedIdx >> OpRatioLog2;
if (OpMask[OpIdx] < 0) {
// The incoming lanes are zero or undef, it doesn't matter which ones we
// are using.
Mask[i] = OpMask[OpIdx];
continue;
}
// Ok, we have non-zero lanes, map them through to one of the Op's inputs.
- int OpMaskedIdx = OpMask[OpIdx] * OpRatio + RootMaskedIdx % OpRatio;
- OpMaskedIdx %= MaskWidth;
+ unsigned OpMaskedIdx =
+ OpMask[OpIdx] * OpRatio + (RootMaskedIdx & (OpRatio - 1));
+ OpMaskedIdx = OpMaskedIdx & (MaskWidth - 1);
if (OpMask[OpIdx] < (int)OpMask.size()) {
assert(0 <= InputIdx0 && "Unknown target shuffle input");
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D34174.102421.patch
Type: text/x-patch
Size: 3783 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170613/121bf3a4/attachment.bin>
More information about the llvm-commits
mailing list