[PATCH] D90445: [SLP] Make SLPVectorizer to use `llvm.masked.gather` intrinsic
Anton Afanasyev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Nov 25 03:20:26 PST 2020
anton-afanasyev added a comment.
In D90445#2411961 <https://reviews.llvm.org/D90445#2411961>, @vdmitrie wrote:
> Current SLP has significant drawback with regard to its cost modeling. And this patch highlights it.
> Consider we have four scalar loads of i8 type. With prior approach (vectorization overhead) we had cost for such entry 4 (x86 target).
> With this new approach we have two entries instead of one: ScatterVectorize loads + NeedToGather GEPs. And costs for these entries are 6 and 10 respectively, thus cost increased from 4 to 16.
> And the problem here is once we put this pattern into the tree it pulls cost up for the entire tree. If we have multiple such patterns over the tree their effect is magnified. These entries finally outweigh possible profit of vectorization for remaining portion of the tree and we end up not vectorizing it at all (even if downstream optimizations could probably change it into optimal code). If SLP could make choice vectorization overhead vs gather intrinsic based in their costs while building vectorizable tree the outcome could be different.
Good point, thank you! As you said, that is not the problem specific for this patch exclusively. One can fix it by hacky cost comparing at the buildind tree stage, but I do believe the more general solution is preferable. Does this patch https://reviews.llvm.org/D57779 (vectorization throttling) fix this? After greedy strategy of building the maximum tree we choose the cheapest part of it for vectorization.
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
More information about the llvm-commits