[llvm] 43d48ed - [PowerPC] Add option to disable perfect shuffle
Qiu Chaofan via llvm-commits
llvm-commits at lists.llvm.org
Sun Feb 20 09:41:12 PST 2022
Author: Qiu Chaofan
Date: 2022-02-21T01:39:35+08:00
New Revision: 43d48ed22029e92d88c608c55c6c42490ec3a243
URL: https://github.com/llvm/llvm-project/commit/43d48ed22029e92d88c608c55c6c42490ec3a243
DIFF: https://github.com/llvm/llvm-project/commit/43d48ed22029e92d88c608c55c6c42490ec3a243.diff
LOG: [PowerPC] Add option to disable perfect shuffle
Perfect shuffle was introduced into PowerPC backend years ago, and only
available in big-endian subtargets. This optimization has good effects
in simple cases, but brings serious negative impact in large programs
with many shuffle instructions sharing the same mask.
Here introduces a temporary backend hidden option to control it until we
implemented better way to fix the gap in vectorshuffle decomposition.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D120072
Added:
Modified:
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
Removed:
################################################################################
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
index 35e3f4e697e2d..7910ba899993b 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
@@ -126,6 +126,11 @@ static cl::opt<bool> EnableQuadwordAtomics(
cl::desc("enable quadword lock-free atomic operations"), cl::init(false),
cl::Hidden);
+static cl::opt<bool>
+ DisablePerfectShuffle("ppc-disable-perfect-shuffle",
+ cl::desc("disable vector permute decomposition"),
+ cl::init(false), cl::Hidden);
+
STATISTIC(NumTailCalls, "Number of tail calls");
STATISTIC(NumSiblingCalls, "Number of sibling calls");
STATISTIC(ShufflesHandledWithVPERM, "Number of shuffles lowered to a VPERM");
@@ -10071,56 +10076,59 @@ SDValue PPCTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op,
// perfect shuffle table to emit an optimal matching sequence.
ArrayRef<int> PermMask = SVOp->getMask();
- unsigned PFIndexes[4];
- bool isFourElementShuffle = true;
- for (unsigned i = 0; i != 4 && isFourElementShuffle; ++i) { // Element number
- unsigned EltNo = 8; // Start out undef.
- for (unsigned j = 0; j != 4; ++j) { // Intra-element byte.
- if (PermMask[i*4+j] < 0)
- continue; // Undef, ignore it.
-
- unsigned ByteSource = PermMask[i*4+j];
- if ((ByteSource & 3) != j) {
- isFourElementShuffle = false;
- break;
- }
+ if (!DisablePerfectShuffle && !isLittleEndian) {
+ unsigned PFIndexes[4];
+ bool isFourElementShuffle = true;
+ for (unsigned i = 0; i != 4 && isFourElementShuffle;
+ ++i) { // Element number
+ unsigned EltNo = 8; // Start out undef.
+ for (unsigned j = 0; j != 4; ++j) { // Intra-element byte.
+ if (PermMask[i * 4 + j] < 0)
+ continue; // Undef, ignore it.
+
+ unsigned ByteSource = PermMask[i * 4 + j];
+ if ((ByteSource & 3) != j) {
+ isFourElementShuffle = false;
+ break;
+ }
- if (EltNo == 8) {
- EltNo = ByteSource/4;
- } else if (EltNo != ByteSource/4) {
- isFourElementShuffle = false;
- break;
+ if (EltNo == 8) {
+ EltNo = ByteSource / 4;
+ } else if (EltNo != ByteSource / 4) {
+ isFourElementShuffle = false;
+ break;
+ }
}
+ PFIndexes[i] = EltNo;
+ }
+
+ // If this shuffle can be expressed as a shuffle of 4-byte elements, use the
+ // perfect shuffle vector to determine if it is cost effective to do this as
+ // discrete instructions, or whether we should use a vperm.
+ // For now, we skip this for little endian until such time as we have a
+ // little-endian perfect shuffle table.
+ if (isFourElementShuffle) {
+ // Compute the index in the perfect shuffle table.
+ unsigned PFTableIndex = PFIndexes[0] * 9 * 9 * 9 + PFIndexes[1] * 9 * 9 +
+ PFIndexes[2] * 9 + PFIndexes[3];
+
+ unsigned PFEntry = PerfectShuffleTable[PFTableIndex];
+ unsigned Cost = (PFEntry >> 30);
+
+ // Determining when to avoid vperm is tricky. Many things affect the cost
+ // of vperm, particularly how many times the perm mask needs to be
+ // computed. For example, if the perm mask can be hoisted out of a loop or
+ // is already used (perhaps because there are multiple permutes with the
+ // same shuffle mask?) the vperm has a cost of 1. OTOH, hoisting the
+ // permute mask out of the loop requires an extra register.
+ //
+ // As a compromise, we only emit discrete instructions if the shuffle can
+ // be generated in 3 or fewer operations. When we have loop information
+ // available, if this block is within a loop, we should avoid using vperm
+ // for 3-operation perms and use a constant pool load instead.
+ if (Cost < 3)
+ return GeneratePerfectShuffle(PFEntry, V1, V2, DAG, dl);
}
- PFIndexes[i] = EltNo;
- }
-
- // If this shuffle can be expressed as a shuffle of 4-byte elements, use the
- // perfect shuffle vector to determine if it is cost effective to do this as
- // discrete instructions, or whether we should use a vperm.
- // For now, we skip this for little endian until such time as we have a
- // little-endian perfect shuffle table.
- if (isFourElementShuffle && !isLittleEndian) {
- // Compute the index in the perfect shuffle table.
- unsigned PFTableIndex =
- PFIndexes[0]*9*9*9+PFIndexes[1]*9*9+PFIndexes[2]*9+PFIndexes[3];
-
- unsigned PFEntry = PerfectShuffleTable[PFTableIndex];
- unsigned Cost = (PFEntry >> 30);
-
- // Determining when to avoid vperm is tricky. Many things affect the cost
- // of vperm, particularly how many times the perm mask needs to be computed.
- // For example, if the perm mask can be hoisted out of a loop or is already
- // used (perhaps because there are multiple permutes with the same shuffle
- // mask?) the vperm has a cost of 1. OTOH, hoisting the permute mask out of
- // the loop requires an extra register.
- //
- // As a compromise, we only emit discrete instructions if the shuffle can be
- // generated in 3 or fewer operations. When we have loop information
- // available, if this block is within a loop, we should avoid using vperm
- // for 3-operation perms and use a constant pool load instead.
- if (Cost < 3)
- return GeneratePerfectShuffle(PFEntry, V1, V2, DAG, dl);
}
// Lower this to a VPERM(V1, V2, V3) expression, where V3 is a constant
More information about the llvm-commits
mailing list