[llvm] [X86] Fold VPERMV(MASK,CONCAT(LO,HI)) -> VPERMV3(WIDEN(LO),MASK',WIDEN(HI)) (PR #129708)
Simon Pilgrim via llvm-commits
llvm-commits at lists.llvm.org
Thu Mar 6 02:53:03 PST 2025
================
@@ -42607,6 +42607,43 @@ static SDValue combineTargetShuffle(SDValue N, const SDLoc &DL,
return SDValue();
}
+ case X86ISD::VPERMV: {
+ // Combine VPERMV to VPERMV3 if the source operand can be freely split.
+ SmallVector<int, 32> Mask;
+ SmallVector<SDValue, 2> SrcOps, SubOps;
+ SDValue Src = peekThroughBitcasts(N.getOperand(1));
+ if ((Subtarget.hasVLX() || VT.is512BitVector()) &&
+ getTargetShuffleMask(N, /*AllowSentinelZero=*/false, SrcOps, Mask) &&
+ collectConcatOps(Src.getNode(), SubOps, DAG)) {
+ assert(Mask.size() == NumElts && "Unexpected shuffle mask size");
+ assert(SrcOps.size() == 1 && "Unexpected shuffle ops");
+ assert((SubOps.size() == 2 || SubOps.size() == 4) &&
+ "Unexpected split ops");
+ // Bail if we were permuting a widened vector.
+ if ((SubOps.size() == 2 && SubOps[1].isUndef()) ||
+ (SubOps.size() == 4 && SubOps[2].isUndef() && SubOps[3].isUndef()))
----------------
RKSimon wrote:
I had assumed that for 4x128 cases we're just limiting to cases where the upper 256-bits are undef - and it should be fine for the lower 2x128 bit pair to be functional - I'll check the effect of restricting this to just the lowest subvector
https://github.com/llvm/llvm-project/pull/129708
More information about the llvm-commits
mailing list