[llvm] [X86] SimplifyDemandedVectorEltsForTargetNode - reduce the size of VPERMV/VPERMV3 nodes if the upper elements are not demanded (PR #133923)
Simon Pilgrim via llvm-commits
llvm-commits at lists.llvm.org
Wed Apr 2 02:42:43 PDT 2025
================
@@ -43814,6 +43815,66 @@ bool X86TargetLowering::SimplifyDemandedVectorEltsForTargetNode(
}
break;
}
+ case X86ISD::VPERMV: {
+ SmallVector<int, 16> Mask;
+ SmallVector<SDValue, 2> Ops;
+ // TODO: Handle 128-bit PERMD/Q -> PSHUFD
+ if (Subtarget.hasVLX() &&
+ (VT.is512BitVector() || VT.getScalarSizeInBits() <= 16) &&
+ getTargetShuffleMask(Op, /*AllowSentinelZero=*/false, Ops, Mask)) {
+ // For lane-crossing shuffles, only split in half in case we're still
+ // referencing higher elements.
+ unsigned HalfElts = NumElts / 2;
+ unsigned HalfSize = SizeInBits / 2;
+ Mask.resize(HalfElts);
+ if (all_of(Mask,
+ [&](int M) { return isUndefOrInRange(M, 0, HalfElts); })) {
----------------
RKSimon wrote:
At about line 43714:
```cpp
// For 256/512-bit ops that are 128/256-bit ops glued together, if we do not
// demand any of the high elements, then narrow the op to 128/256-bits: e.g.
// (op ymm0, ymm1) --> insert undef, (op xmm0, xmm1), 0
if ((VT.is256BitVector() || VT.is512BitVector()) &&
DemandedElts.lshr(NumElts / 2) == 0) {
unsigned SizeInBits = VT.getSizeInBits();
unsigned ExtSizeInBits = SizeInBits / 2;
// See if 512-bit ops only use the bottom 128-bits.
if (VT.is512BitVector() && DemandedElts.lshr(NumElts / 4) == 0)
ExtSizeInBits = SizeInBits / 4;
```
These "vector width reduction" folds are after the standard SimplifyDemandedVectorElts simplifications that are handled earlier in SimplifyDemandedVectorEltsForTargetNode .
https://github.com/llvm/llvm-project/pull/133923
More information about the llvm-commits
mailing list