[PATCH] D101726: [LV] Account for tripcount when calculation vectorization profitability
Dave Green via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed May 5 07:29:36 PDT 2021
dmgreen marked 3 inline comments as done.
dmgreen added inline comments.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5890
+
+ if (!A.Width.isScalar() && !B.Width.isScalable() && FoldTailByMasking &&
+ MaxTripCount) {
----------------
sdesmalen wrote:
> isScalable?
Doh!
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5894
+ // constant, the trip count will be rounded up to an integer number of
+ // iterations. The total cost will be PerIterCost * ceil(TripCount / VF),
+ // which we compare directly. When not folding the tail, the total cost will
----------------
sdesmalen wrote:
> nit: PerVectorIterCost
These ones can be scalar too.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5890
+
+ if (FoldTailByMasking && MaxTripCount) {
+ // If we are folding the tail and the trip count is a known (possibly small)
----------------
sdesmalen wrote:
> dmgreen wrote:
> > sdesmalen wrote:
> > > I think this only makes sense if both factors are fixed-width VFs?
> > > If so, please add this as a condition and use `getFixedValue()` in the cost-calculation.
> > Hmm. Sure. Sounds good!
> >
> > I think we may need something similar for scalable vectors too, eventually. They will run into the same issue with low trip-count loops. It will just not be as obvious what the actual vector width is.
> > I think we may need something similar for scalable vectors too, eventually. They will run into the same issue with low trip-count loops. It will just not be as obvious what the actual vector width is.
> We may be able to use knowledge about the scalable vectors' runtime width from the vscale_range attribute. When we know nothing about the runtime VF, then I'm not sure if we can make any sensible decisions.
Yep, but comparing a non-scalable VF and a scalable VF will be wrong whichever method we choose, unless the scalable factor happens to be 1. I always imagined that the backend TTI would be telling the vectorizer the correct vscale to use if it was known from -mcpu or guess at a likely one if not (which would probably be 1 or 2 at the moment).
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D101726/new/
https://reviews.llvm.org/D101726
More information about the llvm-commits
mailing list