[PATCH] D115462: [SLP]Improve shuffles cost estimation where possible.

Alexey Bataev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu May 26 09:20:52 PDT 2022


ABataev added inline comments.


================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6052
         Cost += TTI->getShuffleCost(
-            TargetTransformInfo::SK_PermuteSingleSrc,
-            FixedVectorType::get(SrcVecTy->getElementType(), Sz));
-      } else if (!IsIdentity) {
-        auto *FirstInsert =
-            cast<Instruction>(*find_if(E->Scalars, [E](Value *V) {
-              return !is_contained(E->Scalars,
-                                   cast<Instruction>(V)->getOperand(0));
-            }));
-        if (isUndefVector(FirstInsert->getOperand(0))) {
-          Cost += TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, SrcVecTy, Mask);
-        } else {
-          SmallVector<int> InsertMask(NumElts);
-          std::iota(InsertMask.begin(), InsertMask.end(), 0);
-          for (unsigned I = 0; I < NumElts; I++) {
-            if (Mask[I] != UndefMaskElem)
-              InsertMask[Offset + I] = NumElts + I;
-          }
-          Cost +=
-              TTI->getShuffleCost(TTI::SK_PermuteTwoSrc, SrcVecTy, InsertMask);
-        }
-      }
+            TTI::SK_Select,
+            NumOfParts > 0
----------------
dmgreen wrote:
> ABataev wrote:
> > dmgreen wrote:
> > > I'm not sure I understand why this would be a SK_Select. That is a bit of a X86 special as far as I understand and doesn't always correlate well to other architectures. Why is the Mask missing too? That might be enough to help avoid the regressions if it was re-added.
> > 1. It is a permuatation of 2 sub-vectors: the root of the buildvector and a subvector after the vectorization. Since it was a buildvector, the compiler selects elements from the root and corresponding elements from the resulting vector.
> > 
> > 2. Mask is not required, if TTI::SK_Select is used, mask is used only with SK_PermuteSingleSrc and SK_PermuteTwoSrc.
> > 
> > But I'll check it.
> AArch64 (and most other architectures AFAIU) do not have SK_Select shuffles, so is not a lot better than SK_PermuteTwoSrc. A Mask can help to improve the cost though, if the backend can come up with something more accurate for it.
> 
> I'm surprised this is not a SK_InsertSubvector with adjacent elements though - that seems like the most natural fit, unless I'm missing how this works.
Yep, you right, it must be an InserSubvector kind, changed it to Select because some cost for InsertSubvector were not implemented.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115462/new/

https://reviews.llvm.org/D115462



More information about the llvm-commits mailing list