[PATCH] D57779: [SLP] Add support for throttling.

Thu Dec 19 09:13:24 PST 2019

dtemirbulatov marked an inline comment as done.
dtemirbulatov added inline comments.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:645
+  /// Save seed instructions to try partially vectorize later.
+  void recordSeeds(ArrayRef<Value *> Ops);
+
----------------
ABataev wrote:
> ABataev wrote:
> > dtemirbulatov wrote:
> > > dtemirbulatov wrote:
> > > > dtemirbulatov wrote:
> > > > > vporpo wrote:
> > > > > > dtemirbulatov wrote:
> > > > > > > vporpo wrote:
> > > > > > > > Why are we collecting the seeds specifically for partial vectorization? Is this really needed? Why don't you just call tryPartialVectorization(Seeds) within vectorizeStorChain() etc. ?
> > > > > > > Ok, Imagine that we could do partial vectorization with vector size 4 for 30% of the tree, but for the same tree, we could have full vectorization with vector size 2. I think that it would be serious regression if we could do it with just 30%. Or for example, for the same tree, we could do reduction later for the whole tree.
> > > > > > Yes, but this problem is not specific to throttling. As it is now, SLP will greedily accept 4-wide vectorization if profitable, without comparing it against 2x 2-wide. I think this problem should be addressed separately.
> > > > > Let's change that behavior separately or this change grows even further. BTW, I measured the SLP run-time change on SPEC2k6 compilation and it is about ~10% on average.
> > > > and It is not clear to me how to compare the benefit of one vectorization against another, the same example we achieved 50% of tree 4 wide vectorization with still profitable cost, let's say,  -1. But, for the same tree, we have 90% of vectorization with vector size 2 with for example the same cost. We could not say let's pick one with the highest score, we are interesting to vectorize while it is profitable.
> > > Maybe we could have a subjective score based upon a vectorized tree-hight and vector widening.
> > I agree with Vasileios here, we should follow the same approach as general SLP vectorization. Otherwise, the compile time increases significantly.
> Also, aggree with Vasileios here. Why do we need to record seeds and then rebuild the same tree for the second time before trying to apply partial vectorization? Why we can't reuse previously built tree and try cut the tree nodes one-by-one or something like this rather than repeat all the previous steps?
Ok, I don't see any regression now without recordSeeds() and spec 2k int also is the same numbers. I will update shortly without recordSeeds(). Thanks.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57779/new/

https://reviews.llvm.org/D57779