[PATCH] D98714: [SLP] Add insertelement instructions to vectorizable tree

Anton Afanasyev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon May 10 05:21:56 PDT 2021


anton-afanasyev marked 6 inline comments as done.
anton-afanasyev added inline comments.


================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:2871-2881
+      unsigned MinIndex = INT_MAX;
+      for (Value *V : VL)
+        MinIndex = std::min(MinIndex, *getInsertIndex(V, 0));
+
+      bool IsConsecutive = true;
+      for (unsigned I = 0, E = VL.size(); I < E; I++)
+        IsConsecutive &= (I == *getInsertIndex(VL[I], 0) - MinIndex);
----------------
ABataev wrote:
> I think this can be simplified:
> ```
> int Offset = *getInsertIndex(VL[0], 0);
> ValueList Operands(VL.size());
> Operands[0] = cast<Instruction>(VL[0])->getOperand(1);
> bool IsConsecutive = true;
> for (unsigned I = 1, E = VL.size(); I < E; ++I) {
>   IsConsecutive &= (I == *getInsertIndex(VL[I], 0) - Offset);
>   Operands[I] = cast<Instruction>(VL[I])->getOperand(1);
> }
> ```
Yes, sure, after refactoring we expect consecutive operands here and this can be simplified. Did this way.


================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:2883
+
+      if (IsConsecutive) {
+        TreeEntry *TE = newTreeEntry(VL, Bundle /*vectorized*/, S, UserTreeIdx);
----------------
ABataev wrote:
> If not consecutive, we can include the cost of single permute and still vectorize it, no?
Inserts here are coming from `findBuildAggregate()`, being already sorted by index and therefore consecutive, the only exclusion is that they can have gaps, so we check it here. I don't think we should process this rare case within this patch.


================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:3666-3667
+      APInt DemandedElts = APInt::getNullValue(NumElts);
+      for (auto *V : VL)
+        DemandedElts.setBit(*getInsertIndex(V, 0));
+
----------------
ABataev wrote:
> Hm, I think you need to normalize insert indices here, the `MinIndex` (offset) might be non `0` but should be 0. Plus, I think, need to add (subtract from vectcost) a cost of subvector insertion from Index `MinIndex` (if `MinIndex != 0`)
Why do we need to normalize insert indices? `DemandElts` is passed to `getScalarizationOverhead()` which just summarizes cost of all eliminated insertelements (with non-normalized operand indices).


================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4236-4240
+      if (NumInsertsUsed < Scalars.size()) {
+        auto *VecTy = cast<FixedVectorType>(EU.Scalar->getType());
+        ExtractCost += TTI->getShuffleCost(
+            TargetTransformInfo::SK_PermuteSingleSrc, VecTy, Mask);
+      }
----------------
ABataev wrote:
> I think this should be:
> ```
> ExtractCost += TTI->getShuffleCost(TTI::SK_InsertSubvector, VL0->getType(), MinIndex, VecTy);
> ```
I've removed this all, since using vectorized value without extraction now.


================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:5166
+    if (auto *Insert = dyn_cast<InsertElementInst>(Scalar)) {
+      Builder.SetInsertPoint(cast<Instruction>(User));
+      Value *V = Vec;
----------------
ABataev wrote:
> What if the user is Phi in thу middle of other phis?
Oh, sure, fixed this.


================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:5172-5174
+      unsigned MinIndex = INT_MAX;
+      for (auto *S : Scalars)
+        MinIndex = std::min(MinIndex, *getInsertIndex(S, 0));
----------------
ABataev wrote:
> You're using this code in many places, make it a function.
I've just got rid of this code using the fact we have `Scalars` already sorted by index.


================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:5184-5185
+
+      if (NumInsertsUsed != Scalars.size())
+        V = Builder.CreateShuffleVector(V, UndefValue::get(V->getType()), Mask);
+
----------------
ABataev wrote:
> Why we can't use just `V` here? 
Hmm, here we are preparing shuffled `V` with undefs elements if `V` is used without several elements inserted (before vectorizer). But undefined positions accept anything, so we can actually use completely filled `V` here. Thanks, I've changed it this way and removed all redundand code!

The only rare case could be if any further instruction inserts element already inserted (having the same index), so we have to exclude this case. But this is guaranteed for inserts coming from `findBuildAggregate()`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98714/new/

https://reviews.llvm.org/D98714



More information about the llvm-commits mailing list