[PATCH] D57779: [SLP] Add support for throttling.

Mon Dec 7 09:16:13 PST 2020

ABataev added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4290
+  auto Cmp = [](const TreeEntry *LHS, const TreeEntry *RHS) {
+    return LHS->Cost > RHS->Cost;
+  };
----------------
dtemirbulatov wrote:
> ABataev wrote:
> > dtemirbulatov wrote:
> > > ABataev wrote:
> > > > Not sure that this is the best criterion. I think you also need to include the distance from the head of the tree to the entry, because some big costs can be compensated by the vectorizable nodes in the tree.
> > > > What I would do here is just some kind of level ordering search (BFS) starting from the deepest level.
> > > Hmm, implemented, but I don't see any benefit from that, plus we have to do BFS search. And we are going to throw away any non-vectorizable nodes at 4295.
> > It may trigger for targets like silvermont or in future for vectorized functions.
> I measured the BFS approach vs this implementation. And with BFS, it is ~10% less efficient on SPEC2006 INT and ~20% less on compilable SPEC2006 FP. By efficiency, I mean the total number of reduced trees while the whole compilation.
Could you post it anyway to check if it may be improved?

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4297
+        Entry->State != TreeEntry::ScatterVectorize) || Entry->Cost <= 0 ||
+        !Entry->Idx)
+      continue;
----------------
dtemirbulatov wrote:
> anton-afanasyev wrote:
> > dtemirbulatov wrote:
> > > ABataev wrote:
> > > > dtemirbulatov wrote:
> > > > > ABataev wrote:
> > > > > > I think you can also exclude entries with the number of operands <= 1.
> > > > > But why? The only thing that matters here is the cost.
> > > > Because the main idea is to drop gathers and drop one gather in favor of another one will not be profitable for sure. But it may improve compile time and the list of candidates, The only case you need to check for is the latest masked gather case, it may be profitable to convert it to gathers for some targets.
> > > I think I can check here if scutter/gather is supported via TargetrInfo and if it is not then move all nodes with TreeEntry::ScatterVectorize to  TreeEntry::Gather.
> > I believe it's wrong decision to check scatter/gather target support for the reason mentioned here https://reviews.llvm.org/D92701#2435573. Why could not we just rely on costs (node cost and total one)?
> I agree with @anton-afanasyev here. I am not sure what @ABataev wants here? If I exclude (operands <= 1) then we would lose have of all tests in SLP affected by throttling.  
I did not say anything about checking if scatter is supported here. I just said that we can improve the criterion here by checking that the entry node has at least 2 operands (because if it has just one operand, most probably we can skip it) and we just need to check the nodes with only 1 operand if it is gather scatter node, because it may be better to represent it as simple gather.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57779/new/

https://reviews.llvm.org/D57779