[PATCH] D57779: [SLP] Add support for throttling.

Mon Mar 18 20:41:46 PDT 2019

vporpo added inline comments.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:527
   /// A negative number means that this is profitable.
-  int getTreeCost();
+  int getTreeCost(bool ReduceTree = false);

----------------
Could you use a cl::opt flag instead of this ReduceTree function argument?

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2450

+bool BoUpSLP::reduceTree() {
+  // Estimating where to stop vectorization.
----------------
I am nitpicking here, but "reduce" is kind of overloaded in the SLP pass. We also have tryToReduce(), which is for reduction.
Please try to look for a more unique name (maybe shrinkTree(), or cutTree(), or throttleTree() or something similar)

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2471
+    if (VectorizableTree[I].ScalarizeAtCost < -SLPCostThreshold &&
+        NonGather[I]>1) {
+      MinCost = VectorizableTree[I].ScalarizeAtCost;
----------------
Could you check this in a better way, without using an additional 'NonGather' vector? Maybe introduce a small helper function isJustTineyTree(I) that checks a few entries below 'I'.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2680
+    // Avoid non-vectorized scalars for this tree hight.
+    if (ScalarsToVec.count(EU.Scalar) == 0) {
+      // Consider the possibility of extracting vectorized
----------------
if (!ScalarsToVec.count(EU.Scalar))

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2683
+      // values for canceled elements use.
+      if (InternalTreeUses.find(EU.Scalar) != InternalTreeUses.end())
+        for (ExternalUser &IU : InternalTreeUses[EU.Scalar])
----------------
As far as I understand, you are calculating the additional cost for the EU.Scalars that are no longer vectorized (because of reduceTree()).
So you are getting all its in-tree uses and you are calculating the extract operation cost from for the data transfer from EU.Scalar->vector use.
1. Isn't this an 'insert' operation and not an 'extract' ?
2. Isn't this just the gather cost that has already been considered in line 2756 ?
3. Even if you have multiple in-tree vectorized uses, we should only count the gather cost once, as the vector can then be reused for the rest of the uses with no extra cost.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2739
+      for (Value *V : Entry->Scalars)
+        ScalarsToVec.insert(V);
+      auto It = UseTreeIndices.find(I);
----------------
I think that introducing the 'ScalarsToVec' to hold the vectorizable scalars "below the cut" could be done better. In particular I don't like that it is local to getTreeCost() and gets carried around in many functions, and that it is some extra state that is not very visible.

Isn't it better to add a bool 'IsThrottled' flag in TreeEntry? This way:
1. you don't need to carry anything around and,
2. one can easily tell if an entry gets vectorized or not, at a given time, just by looking at the Entry
Also you can simply check if a V is vectorized by E = getTreeEntry(V); if (E && !E->isThrottled).

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2748
+
+    if (ReduceTree || I == (E - 1)) {
+      Entry->ScalarizeAtCost = CostSum;
----------------
If ReduceTree is false, is the cost calculation correct (no  extract cost?) ? 

In any case if ReduceTree == false, I think it is better if we compute the cost as in the original code, without having to compute extractCost() etc. for every single 'cut' of the tree.

Also consider adding some of the costs as member variables in TreeEntry: like ExtractCost, SpillCost, GatherCost  etc.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3751
+    // vectorization decision.
+    if (RemovedOperations.size())
+      removeFromScheduling(BS);
----------------
if (!RemovedOperations.empty())

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:4893
+    assert(isa<StoreInst>(V) && "Expected only StoreInst here!");
+    if (StoreInst *SI = cast<StoreInst>(V))
+      if (SI->getValueOperand())
----------------
Isn't Chain containing only stores? Why do we need the if here ?
Also, why wasn't R.getVectorElementSize(Chain[0])  OK?

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:4894
+    if (StoreInst *SI = cast<StoreInst>(V))
+      if (SI->getValueOperand())
+        FirstStore = V;
----------------
>From your comment of line 4920 I guess you are also checking for vectorized 'SI' instructions.
How can this be caused by reduceTree ?

If this is indeed required here (please let me know why) and try to use a function for it and reuse it for both 'Chain' and 'Operands' (line 4920) if possible.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:4897
+  }
+  if (!FirstStore)
+    return false;
----------------
Why is this code needed? Can reduceTree remove all the instructions from the SLP tree? This does not sound right.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:4955
+      R.getTreeCost(true);
+      Changed = R.reduceTree();
+      if (Changed) {
----------------
Could you add a cl::opt flag to guard this (and the whole reduceTree optimization).

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57779/new/

https://reviews.llvm.org/D57779