[llvm] [X86] combineX86ShuffleChain - don't combine to VPERMI2W/VPERMI2B from just any single variable mask (PR #127914)
Simon Pilgrim via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 20 00:22:45 PST 2025
================
@@ -40064,23 +40064,29 @@ static SDValue combineX86ShuffleChain(ArrayRef<SDValue> Inputs, SDValue Root,
if (Depth < 1)
return SDValue();
- bool HasVariableMask = llvm::any_of(SrcNodes, [](const SDNode *N) {
+ int NumVariableMasks = llvm::count_if(SrcNodes, [](const SDNode *N) {
return isTargetShuffleVariableMask(N->getOpcode());
});
+ bool HasSlowVariableMask = llvm::any_of(SrcNodes, [](const SDNode *N) {
+ return (N->getOpcode() == X86ISD::VPERMV3 ||
+ N->getOpcode() == X86ISD::VPERMV);
+ });
// Depth threshold above which we can efficiently use variable mask shuffles.
int VariableCrossLaneShuffleDepth =
Subtarget.hasFastVariableCrossLaneShuffle() ? 1 : 2;
int VariablePerLaneShuffleDepth =
Subtarget.hasFastVariablePerLaneShuffle() ? 1 : 2;
AllowVariableCrossLaneMask &=
- (Depth >= VariableCrossLaneShuffleDepth) || HasVariableMask;
+ (Depth >= VariableCrossLaneShuffleDepth) || NumVariableMasks;
AllowVariablePerLaneMask &=
- (Depth >= VariablePerLaneShuffleDepth) || HasVariableMask;
+ (Depth >= VariablePerLaneShuffleDepth) || NumVariableMasks;
// VPERMI2W/VPERMI2B are 3 uops on Skylake and Icelake so we require a
// higher depth before combining them.
+ int BWIVPERMV3ShuffleDepth =
+ VariableCrossLaneShuffleDepth + 2 - NumVariableMasks;
----------------
RKSimon wrote:
Done
https://github.com/llvm/llvm-project/pull/127914
More information about the llvm-commits
mailing list