[PATCH] D101726: [LV] Account for tripcount when calculation vectorization profitability

Wed May 5 07:29:36 PDT 2021

dmgreen marked 3 inline comments as done.
dmgreen added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5890
+
+  if (!A.Width.isScalar() && !B.Width.isScalable() && FoldTailByMasking &&
+      MaxTripCount) {
----------------
sdesmalen wrote:
> isScalable?
Doh!

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5894
+    // constant, the trip count will be rounded up to an integer number of
+    // iterations. The total cost will be PerIterCost * ceil(TripCount / VF),
+    // which we compare directly. When not folding the tail, the total cost will
----------------
sdesmalen wrote:
> nit: PerVectorIterCost
These ones can be scalar too.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5890
+
+  if (FoldTailByMasking && MaxTripCount) {
+    // If we are folding the tail and the trip count is a known (possibly small)
----------------
sdesmalen wrote:
> dmgreen wrote:
> > sdesmalen wrote:
> > > I think this only makes sense if both factors are fixed-width VFs?
> > > If so, please add this as a condition and use `getFixedValue()` in the cost-calculation.
> > Hmm. Sure. Sounds good!
> > 
> > I think we may need something similar for scalable vectors too, eventually. They will run into the same issue with low trip-count loops. It will just not be as obvious what the actual vector width is.
> > I think we may need something similar for scalable vectors too, eventually. They will run into the same issue with low trip-count loops. It will just not be as obvious what the actual vector width is.
> We may be able to use knowledge about the scalable vectors' runtime width from the vscale_range attribute. When we know nothing about the runtime VF, then I'm not sure if we can make any sensible decisions.
Yep, but comparing a non-scalable VF and a scalable VF will be wrong whichever method we choose, unless the scalable factor happens to be 1. I always imagined that the backend TTI would be telling the vectorizer the correct vscale to use if it was known from -mcpu or guess at a likely one if not (which would probably be 1 or 2 at the moment).

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101726/new/

https://reviews.llvm.org/D101726