[PATCH] D57779: [SLP] Add support for throttling.

Thu Oct 17 08:40:36 PDT 2019

ABataev added inline comments.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:645
+  /// Save seed instructions to try partially vectorize later.
+  void recordSeeds(ArrayRef<Value *> Ops);
+
----------------
dtemirbulatov wrote:
> dtemirbulatov wrote:
> > dtemirbulatov wrote:
> > > vporpo wrote:
> > > > dtemirbulatov wrote:
> > > > > vporpo wrote:
> > > > > > Why are we collecting the seeds specifically for partial vectorization? Is this really needed? Why don't you just call tryPartialVectorization(Seeds) within vectorizeStorChain() etc. ?
> > > > > Ok, Imagine that we could do partial vectorization with vector size 4 for 30% of the tree, but for the same tree, we could have full vectorization with vector size 2. I think that it would be serious regression if we could do it with just 30%. Or for example, for the same tree, we could do reduction later for the whole tree.
> > > > Yes, but this problem is not specific to throttling. As it is now, SLP will greedily accept 4-wide vectorization if profitable, without comparing it against 2x 2-wide. I think this problem should be addressed separately.
> > > Let's change that behavior separately or this change grows even further. BTW, I measured the SLP run-time change on SPEC2k6 compilation and it is about ~10% on average.
> > and It is not clear to me how to compare the benefit of one vectorization against another, the same example we achieved 50% of tree 4 wide vectorization with still profitable cost, let's say,  -1. But, for the same tree, we have 90% of vectorization with vector size 2 with for example the same cost. We could not say let's pick one with the highest score, we are interesting to vectorize while it is profitable.
> Maybe we could have a subjective score based upon a vectorized tree-hight and vector widening.
I agree with Vasileios here, we should follow the same approach as general SLP vectorization. Otherwise, the compile time increases significantly.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1303
+    /// The index containing the use of this entry.
+    SmallVector<TreeEntry *, 1> UseTreeIndices;
+
----------------
`TinyPtrVector<TreeEntry *>`?

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1416
+    // Find nodes with more than one use.
+    bool isBranch() {
+      return llvm::count_if(UseTreeIndices, [this](TreeEntry *Next) {
----------------
const member function.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1473-1474
     Last->Scalars.insert(Last->Scalars.begin(), VL.begin(), VL.end());
     Last->NeedToGather = !Vectorized;
+    Last->ProposedToGather = !Vectorized;
     Last->ReuseShuffleIndices.append(ReuseShuffleIndices.begin(),
----------------
Maybe, it is better to merge these 2 flags into one and use enumeric instead of booleans? Seems to me, these flags are mutually exclusive.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2920
+  SmallVector<TreeEntry *, 4> VecNodes;
+  for (auto &TEPtr : VectorizableTree) {
+    TreeEntry *Entry = TEPtr.get();
----------------
`auto`->real type

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2927-2933
+  auto It = llvm::find_if(VecNodes, [](TreeEntry *E) {
+    Instruction *Inst = E->getMainOp();
+    return (Inst && (isa<BinaryOperator>(Inst) || isa<FPMathOperator>(Inst) ||
+                     isa<CmpInst>(Inst)));
+  });
+  if (It == VecNodes.end())
+    return false;
----------------
You can do this analysis inside the first loop, no?

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2974
+                          << ".\n");
+        ExternalUses.emplace_back(ExternalUser(Scalar, U, Lane));
+      }
----------------
Remove call of `ExternalUser` constructor, just `ExternalUses.emplace_back(Scalar, U, Lane);`

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3004-3005
+    // decisions.
+    if (Cost.first < -SLPCostThreshold)
+      continue;
+    for (auto &TEPtr : VectorizableTree) {
----------------
I think it must be an assert, that `Cost.first >= -SLPCostThreshold` here.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3006
+      continue;
+    for (auto &TEPtr : VectorizableTree) {
+      TreeEntry *TE = TEPtr.get();
----------------
`auto`->real type

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3018-3024
+      SmallString<256> Str;
+      raw_svector_ostream OS(Str);
+      OS << "SLP: Partial vectorization with Total Cost = "
+         << PartialCost.getValue() << ".\n";
+      LLVM_DEBUG(dbgs() << Str);
+      if (ViewSLPTree)
+        ViewGraph(this, "SLP" + F->getName(), false, Str);
----------------
All this code must be compiled only in the debug mode.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3027
+
+    if (PartialCost.hasValue() && PartialCost.getValue() < -SLPCostThreshold) {
+      LLVM_DEBUG(dbgs() << "SLP: Decided to partially vectorize with cost="
----------------
Merge this `if` with the previous one.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3696
+  for (TreeEntry *N : Path) {
+    CostsRecalculations++;
+
----------------
Missed the check for `CostsRecalculations >= MaxCostsRecalculations` here

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3710
+    if (PartialCost < -SLPCostThreshold && cutTree())
+      return PartialCost;
+  }
----------------
Maybe transform the function into return `bool` and return the cost in `Cost` parameter, passed by reference?

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3718
+  int NonGatherUse;
+  if (!is_contained(Path, C))
+    Path.push_back(C);
----------------
`is_contained` is used several times and not effective. Maybe you can use a set here instead of SmallVector?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57779/new/

https://reviews.llvm.org/D57779