[PATCH] D115462: [SLP]Improve shuffles cost estimation where possible.

Simon Pilgrim via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu May 26 09:54:37 PDT 2022


RKSimon added inline comments.


================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6052
         Cost += TTI->getShuffleCost(
-            TargetTransformInfo::SK_PermuteSingleSrc,
-            FixedVectorType::get(SrcVecTy->getElementType(), Sz));
-      } else if (!IsIdentity) {
-        auto *FirstInsert =
-            cast<Instruction>(*find_if(E->Scalars, [E](Value *V) {
-              return !is_contained(E->Scalars,
-                                   cast<Instruction>(V)->getOperand(0));
-            }));
-        if (isUndefVector(FirstInsert->getOperand(0))) {
-          Cost += TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, SrcVecTy, Mask);
-        } else {
-          SmallVector<int> InsertMask(NumElts);
-          std::iota(InsertMask.begin(), InsertMask.end(), 0);
-          for (unsigned I = 0; I < NumElts; I++) {
-            if (Mask[I] != UndefMaskElem)
-              InsertMask[Offset + I] = NumElts + I;
-          }
-          Cost +=
-              TTI->getShuffleCost(TTI::SK_PermuteTwoSrc, SrcVecTy, InsertMask);
-        }
-      }
+            TTI::SK_Select,
+            NumOfParts > 0
----------------
ABataev wrote:
> dmgreen wrote:
> > ABataev wrote:
> > > dmgreen wrote:
> > > > I'm not sure I understand why this would be a SK_Select. That is a bit of a X86 special as far as I understand and doesn't always correlate well to other architectures. Why is the Mask missing too? That might be enough to help avoid the regressions if it was re-added.
> > > 1. It is a permuatation of 2 sub-vectors: the root of the buildvector and a subvector after the vectorization. Since it was a buildvector, the compiler selects elements from the root and corresponding elements from the resulting vector.
> > > 
> > > 2. Mask is not required, if TTI::SK_Select is used, mask is used only with SK_PermuteSingleSrc and SK_PermuteTwoSrc.
> > > 
> > > But I'll check it.
> > AArch64 (and most other architectures AFAIU) do not have SK_Select shuffles, so is not a lot better than SK_PermuteTwoSrc. A Mask can help to improve the cost though, if the backend can come up with something more accurate for it.
> > 
> > I'm surprised this is not a SK_InsertSubvector with adjacent elements though - that seems like the most natural fit, unless I'm missing how this works.
> Yep, you right, it must be an InserSubvector kind, changed it to Select because some cost for InsertSubvector were not implemented.
was this on x86 / aarch64 or some other target?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115462/new/

https://reviews.llvm.org/D115462



More information about the llvm-commits mailing list