[llvm] [X86] SimplifyDemandedVectorEltsForTargetNode - reduce the size of VPERMV/VPERMV3 nodes if the upper elements are not demanded (PR #133923)

Simon Pilgrim via llvm-commits llvm-commits at lists.llvm.org
Wed Apr 2 02:42:43 PDT 2025


================
@@ -43814,6 +43815,66 @@ bool X86TargetLowering::SimplifyDemandedVectorEltsForTargetNode(
       }
       break;
     }
+    case X86ISD::VPERMV: {
+      SmallVector<int, 16> Mask;
+      SmallVector<SDValue, 2> Ops;
+      // TODO: Handle 128-bit PERMD/Q -> PSHUFD
+      if (Subtarget.hasVLX() &&
+          (VT.is512BitVector() || VT.getScalarSizeInBits() <= 16) &&
+          getTargetShuffleMask(Op, /*AllowSentinelZero=*/false, Ops, Mask)) {
+        // For lane-crossing shuffles, only split in half in case we're still
+        // referencing higher elements.
+        unsigned HalfElts = NumElts / 2;
+        unsigned HalfSize = SizeInBits / 2;
+        Mask.resize(HalfElts);
+        if (all_of(Mask,
+                   [&](int M) { return isUndefOrInRange(M, 0, HalfElts); })) {
----------------
RKSimon wrote:

At about line 43714:
```cpp
  // For 256/512-bit ops that are 128/256-bit ops glued together, if we do not
  // demand any of the high elements, then narrow the op to 128/256-bits: e.g.
  // (op ymm0, ymm1) --> insert undef, (op xmm0, xmm1), 0
  if ((VT.is256BitVector() || VT.is512BitVector()) &&
      DemandedElts.lshr(NumElts / 2) == 0) {
    unsigned SizeInBits = VT.getSizeInBits();
    unsigned ExtSizeInBits = SizeInBits / 2;

    // See if 512-bit ops only use the bottom 128-bits.
    if (VT.is512BitVector() && DemandedElts.lshr(NumElts / 4) == 0)
      ExtSizeInBits = SizeInBits / 4;
```
These "vector width reduction" folds are after the standard SimplifyDemandedVectorElts simplifications that are handled earlier in SimplifyDemandedVectorEltsForTargetNode .

https://github.com/llvm/llvm-project/pull/133923


More information about the llvm-commits mailing list