[PATCH] D98714: [SLP] Add insertelement instructions to vectorizable tree

Fri Apr 16 03:45:35 PDT 2021

anton-afanasyev marked 11 inline comments as done.
anton-afanasyev added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:3505
     ScalarTy = CI->getOperand(0)->getType();
+  else if (InsertElementInst *IE = dyn_cast<InsertElementInst>(VL[0]))
+    ScalarTy = IE->getOperand(1)->getType();
----------------
ABataev wrote:
> `auto *IE`
Thanks, done

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:3639
+      InstructionCost Cost = 0;
+      auto *SrcVecTy = cast<FixedVectorType>(VL[0]->getType());
+
----------------
ABataev wrote:
> `VL[0]`->`VL0`?
Thanks, done

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:3975
+  // Vectorizing inserts of gathered values is redundant.
+  if (isa<InsertElementInst>(VectorizableTree[0]->Scalars[0]) &&
+      VectorizableTree[1]->State == TreeEntry::NeedToGather)
----------------
ABataev wrote:
> I think better to check that all elements are `InsertElementInst` and move this check closer  to the end of the function
If any of elements is `InsertElementInst`, they are all the same, so checking one is enough. And we can't move it closer to the end, since the next check can return true.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4204
+    if (auto *Insert = dyn_cast<InsertElementInst>(EU.Scalar)) {
+      auto &Scalars = getTreeEntry(EU.Scalar)->Scalars;
+      SmallVector<int> Mask(Scalars.size(), -1);
----------------
ABataev wrote:
> Better to create a set here
Thanks, done

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4205
+      auto &Scalars = getTreeEntry(EU.Scalar)->Scalars;
+      SmallVector<int> Mask(Scalars.size(), -1);
+      int MinIndex = INT_MAX;
----------------
ABataev wrote:
> Use `UndefMaskElem` instead of `-1`
Thanks, done

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4227-4228
+        auto *VecTy = cast<FixedVectorType>(EU.Scalar->getType());
+        ExtractCost += TTI->getShuffleCost(
+            TargetTransformInfo::SK_PermuteSingleSrc, VecTy, Mask);
+      }
----------------
ABataev wrote:
> Check for the kind of permutation in `Mask`, it may be an Identity of a Reverse kind.
It's definitely not an Identity and not a Reverse here.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4575
     ScalarTy = Store->getValueOperand()->getType();
+  else if (InsertElementInst *IE = dyn_cast<InsertElementInst>(VL0))
+    ScalarTy = IE->getOperand(1)->getType();
----------------
ABataev wrote:
> `auto *IE`
Thanks, done

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4636-4678
+    case Instruction::InsertElement: {
+      Builder.SetInsertPoint(VL0);
+      Value *V = vectorizeTree(E->getOperand(0));
+
+      const int NumElts =
+          cast<FixedVectorType>(VL0->getType())->getNumElements();
+      const int NumScalars = E->Scalars.size();
----------------
ABataev wrote:
> Can we leave it to InstCombiner and return the original insrtelement instruction as a result?
Do you mean to come back to extracts generating here?
I don't believe it's a good way by several reasons:
1) It doesn't match the common logic of vectorization (changing scalar operations to one vector operation),
2) If we leave insertelements unvectorized (i.e. replaced by shuffles), these inserts may be somehow processed again,
3) If we can deal with it in SLPVectorizer, why not to do it early? It's not too hard to do.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:5189-5190
+
+      LLVM_DEBUG(dbgs() << "SLP: Replaced:" << *User << ".\n");
+    } else if (auto *VecI = dyn_cast<Instruction>(Vec)) {
       if (PHINode *PH = dyn_cast<PHINode>(User)) {
----------------
ABataev wrote:
> ```
>   continue;
> }
> if (...) {
> ```
Thanks, done

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6525-6527
+      ArrayRef<Value *> Ops = InsertUses.empty()
+                                  ? VL.slice(I, OpsWidth)
+                                  : InsertUses.slice(I, OpsWidth);
----------------
ABataev wrote:
> Why not just pass `InsertUses` as ёМДё here and drop `InsertUses` parameter completely?
Yes, I've planned to do this, but can't: we make a decision about vectorization based on the _operands_:
```
  InstructionsState S = getSameOpcode(VL);
  if (!S.getOpcode())
    return false;
 ...
 unsigned Sz = R.getVectorElementSize(I0);
```
but then make vectorization starting from inserts.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98714/new/

https://reviews.llvm.org/D98714