[PATCH] D90445: [SLP] Make SLPVectorizer to use `llvm.masked.gather` intrinsic

Wed Nov 25 03:20:26 PST 2020

anton-afanasyev added a comment.

In D90445#2411961 <https://reviews.llvm.org/D90445#2411961>, @vdmitrie wrote:

> Current SLP has significant drawback with regard to its cost modeling. And this patch highlights it.  
> Consider we have  four scalar loads  of i8 type. With prior approach (vectorization overhead)  we had cost for such entry 4  (x86 target). 
> With this new approach we have two entries instead of one:  ScatterVectorize  loads + NeedToGather GEPs. And costs for these entries are 6 and 10 respectively, thus cost increased from 4 to  16.
> And the problem here is once we put this pattern into the tree it pulls cost up for the entire tree. If we have multiple such patterns over the tree their effect is magnified. These entries finally outweigh possible profit  of vectorization for remaining portion of the tree and we end up not vectorizing it at all (even if downstream optimizations could probably change it into optimal code).  If SLP could make choice  vectorization overhead vs gather intrinsic based in their costs while building  vectorizable tree the outcome could be different.

Good point, thank you! As you said, that is not the problem specific for this patch exclusively. One can fix it by hacky cost comparing at the buildind tree stage, but I do believe the more general solution is preferable. Does this patch https://reviews.llvm.org/D57779 (vectorization throttling) fix this? After greedy strategy of building the maximum tree we choose the cheapest part of it for vectorization.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90445/new/

https://reviews.llvm.org/D90445