[llvm] [X86] combineX86ShuffleChain - always prefer VPERMQ/PD for unary subvector shuffles on AVX2+ targets (PR #134849)
Simon Pilgrim via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 8 07:45:42 PDT 2025
================
@@ -39838,15 +39838,14 @@ static SDValue combineX86ShuffleChain(
return insertSubVector(Lo, Hi, NumRootElts / 2, DAG, DL, 128);
}
- if (Depth == 0 && RootOpc == X86ISD::VPERM2X128)
- return SDValue(); // Nothing to do!
-
// If we have AVX2, prefer to use VPERMQ/VPERMPD for unary shuffles unless
// we need to use the zeroing feature.
// Prefer blends for sequential shuffles unless we are optimizing for size.
if (UnaryShuffle &&
!(Subtarget.hasAVX2() && isUndefOrInRange(Mask, 0, 2)) &&
----------------
RKSimon wrote:
The `return SDValue()` early-outs are only used if we've already combined to the same type of node that we want (and Depth == 0 check means we wouldn't be combing any additional shuffles into it) - this prevent infinite loops when we peek through bitcasts etc. This patch moves the earlyout inside the specific case used for AVX1 (or if AVX2 can't use PERMQ/PD). Actual PERMQ/PD matching occurs later on in combineX86ShuffleChain in the matchUnaryPermuteShuffle call.
https://github.com/llvm/llvm-project/pull/134849
More information about the llvm-commits
mailing list