[PATCH] D115713: [LV] Don't apply "TinyTripCountVectorThreshold" for loops with compile time known TC.

Thu Nov 3 01:13:45 PDT 2022

ebrevnov added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9870-9875
+      // If trip count is known compile time constant there is no any extra
+      // overhead from vectorization with scalar epilogue. Let cost model to
+      // decide on profitability in that case.
+      auto TC = SE.getSmallConstantTripCount(L);
+      if (!TC) {
+        if (*ExpectedTC <= TTI->getMinTripCountTailFoldingThreshold()) {
----------------
dmgreen wrote:
> ebrevnov wrote:
> > paulwalker-arm wrote:
> > > I don't quite understand this change.  The whole point of `getMinTripCountTailFoldingThreshold()` was to give targets control over this behaviour based on their understanding how the cost model has been implemented.
> > > 
> > > Admittedly this was in part due to the immaturity of the cost modelling but this change essentially removes that flexibility to the point where there's no value in keeping `getMinTripCountTailFoldingThreshold()`?
> > > 
> > > If your previous patches improve the cost model in this regard then I'd rather `getMinTripCountTailFoldingThreshold()` be removed.  That said, @dtemirbulatov can you help here in ascertaining if this option is still required based on this new patch series?
> > I think you are missing one thing. Currently we allow vectorization of short trip count loops if there is  no run-time overhead and the tail can be folded only. Introduction of getMinTripCountTailFoldingThreshold put additional limitations on vectorization of such loops. This change enables vectorization of short trip count loops with scalar epilogue ONLY If trip count is compile time constant. getMinTripCountTailFoldingThreshold is still applicable if we decide to vectorize by folding the tail.
> > 
> > In other words, this change enables exactly the same way of vectorization for loops with small constant trip count as for loops with unknown or large trip count.
> > 
> > It's a question whether getMinTripCountTailFoldingThreshold threshold should be taken into account if it's decided to fold the tail at step 5) for short trip count loop. Possibly yes...  I think this is your original concern, right?
> Are you assuming that the loop will be unrolled after vectorization, and that unrolling will leave no extra overhead? Scalable vector loops cannot (currently) be unrolled.
Actually no. Even if we keep vector loop which runs just 1 or 2 iterations I don't see any additional overhead comparing to scalar loop. At least not on IR level. Is there any overhead specific to MVE not visible on IR level?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115713/new/

https://reviews.llvm.org/D115713