[llvm] [X86] Fold VPERMV(MASK,CONCAT(LO,HI)) -> VPERMV3(WIDEN(LO),MASK',WIDEN(HI)) (PR #129708)

Simon Pilgrim via llvm-commits llvm-commits at lists.llvm.org
Thu Mar 6 02:53:03 PST 2025


================
@@ -42607,6 +42607,43 @@ static SDValue combineTargetShuffle(SDValue N, const SDLoc &DL,
 
     return SDValue();
   }
+  case X86ISD::VPERMV: {
+    // Combine VPERMV to VPERMV3 if the source operand can be freely split.
+    SmallVector<int, 32> Mask;
+    SmallVector<SDValue, 2> SrcOps, SubOps;
+    SDValue Src = peekThroughBitcasts(N.getOperand(1));
+    if ((Subtarget.hasVLX() || VT.is512BitVector()) &&
+        getTargetShuffleMask(N, /*AllowSentinelZero=*/false, SrcOps, Mask) &&
+        collectConcatOps(Src.getNode(), SubOps, DAG)) {
+      assert(Mask.size() == NumElts && "Unexpected shuffle mask size");
+      assert(SrcOps.size() == 1 && "Unexpected shuffle ops");
+      assert((SubOps.size() == 2 || SubOps.size() == 4) &&
+             "Unexpected split ops");
+      // Bail if we were permuting a widened vector.
+      if ((SubOps.size() == 2 && SubOps[1].isUndef()) ||
+          (SubOps.size() == 4 && SubOps[2].isUndef() && SubOps[3].isUndef()))
----------------
RKSimon wrote:

I had assumed that for 4x128 cases we're just limiting to cases where the upper 256-bits are undef - and it should be fine for the lower 2x128 bit pair to be functional - I'll check the effect of restricting this to just the lowest subvector

https://github.com/llvm/llvm-project/pull/129708


More information about the llvm-commits mailing list