[PATCH] D101726: [LV] Account for tripcount when calculation vectorization profitability

Wed May 5 06:34:17 PDT 2021

sdesmalen added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5890
+
+  if (!A.Width.isScalar() && !B.Width.isScalable() && FoldTailByMasking &&
+      MaxTripCount) {
----------------
isScalable?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5894
+    // constant, the trip count will be rounded up to an integer number of
+    // iterations. The total cost will be PerIterCost * ceil(TripCount / VF),
+    // which we compare directly. When not folding the tail, the total cost will
----------------
nit: PerVectorIterCost

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5896
+    // which we compare directly. When not folding the tail, the total cost will
+    // be PerIterCost*floor(TC/VF) + Scalar remainder cost, and so is
+    // approximated with the per-lane cost below instead of using the tripcount
----------------
nit: PerVectorIterCost

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5900
+    int64_t RTCostA =
+        CostA * divideCeil(MaxTripCount, A.Width.getKnownMinValue());
+    int64_t RTCostB =
----------------
nit: `getFixedValue` (here and below)

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5890
+
+  if (FoldTailByMasking && MaxTripCount) {
+    // If we are folding the tail and the trip count is a known (possibly small)
----------------
dmgreen wrote:
> sdesmalen wrote:
> > I think this only makes sense if both factors are fixed-width VFs?
> > If so, please add this as a condition and use `getFixedValue()` in the cost-calculation.
> Hmm. Sure. Sounds good!
> 
> I think we may need something similar for scalable vectors too, eventually. They will run into the same issue with low trip-count loops. It will just not be as obvious what the actual vector width is.
> I think we may need something similar for scalable vectors too, eventually. They will run into the same issue with low trip-count loops. It will just not be as obvious what the actual vector width is.
We may be able to use knowledge about the scalable vectors' runtime width from the vscale_range attribute. When we know nothing about the runtime VF, then I'm not sure if we can make any sensible decisions.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5895
+    uint64_t RTCostA =
+        CostA * divideCeil(MaxTripCount, A.Width.getKnownMinValue());
+    uint64_t RTCostB =
----------------
dmgreen wrote:
> sdesmalen wrote:
> > should this be `B` (and the one below be `A`)?
> I'm not sure I following why. Can you give some more details which part would be B/A?
Sorry, please ignore that comment. You're not calculating the "cost per lane" (like we do below, which switches B/A), but rather calculating the total cost for handling TC scalar iterations by doing ceil(TC/VF) vector iterations.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101726/new/

https://reviews.llvm.org/D101726