[PATCH] D57779: [SLP] Add support for throttling.

Vasileios Porpodas via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon May 20 14:38:18 PDT 2019


vporpo added a comment.

I think the first throttling patch should implement a very simple and fast algorithm for finding the cut:

1. Add new fields to TreeEntry for Cost, ExtractCost and PredecessorsCost.
2. During getTreeCost() set the TE.Cost and TE.ExtractCost (as you did in an earlier version of the patch if I am not mistaken)
3. Do a single top-down traversal of the tree in reverse postorder and set the TE.PredecessorsCost equal to the cost of all the predecessor's costs until TE. While doing so, you can compare the cost of cutting just below TE by comparing the gather cost of TE versus the Cost + PredecessorsCost. This is very fast as you only need to visit each TreeEntry node once, so the complexity is linear to the size of the tree.

For example, in slp-throttle.ll the bundle that needs to be scalarized [%add19, %sub22] has costs of Cost=1, ExtractCost = 0, PredecessorsCost=1 (because of bundle [%mul18, undef]). Cutting below the bundle has a cost of +1, while keeping it vectorized has a cost of +2 (Cost=1 + PredecessorsCost=1).

This should be good-enough for most simple cases. We can improve it later, if needed, with follow-up patches.
What do you think?



================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3521
+  // Reduce the path from branch L to find a profitable cut.
+  for (unsigned I = 0; I < Path.size(); I++) {
+    SmallVector<unsigned, 2> SubTree(Vecs);
----------------
This is calculating every possible cut in the path to the leaf. However, some paths will probably share the nodes close to the root, so we are recomputing the same thing multiple times.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:5523
+
+    // Partially vectorize trees after all full vectorization is done,
+    // otherwise, we could prevent more profitable full vectorization with
----------------
I don't think throttling should be visible at this level. It should be called after the call to getTreeCost().


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57779/new/

https://reviews.llvm.org/D57779





More information about the llvm-commits mailing list