[PATCH] D115713: [LV] Don't apply "TinyTripCountVectorThreshold" for loops with compile time known TC.
Evgeniy via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Nov 7 04:45:23 PST 2022
ebrevnov added inline comments.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9870-9875
+ // If trip count is known compile time constant there is no any extra
+ // overhead from vectorization with scalar epilogue. Let cost model to
+ // decide on profitability in that case.
+ auto TC = SE.getSmallConstantTripCount(L);
+ if (!TC) {
+ if (*ExpectedTC <= TTI->getMinTripCountTailFoldingThreshold()) {
----------------
dmgreen wrote:
> ebrevnov wrote:
> > dmgreen wrote:
> > > ebrevnov wrote:
> > > > paulwalker-arm wrote:
> > > > > I don't quite understand this change. The whole point of `getMinTripCountTailFoldingThreshold()` was to give targets control over this behaviour based on their understanding how the cost model has been implemented.
> > > > >
> > > > > Admittedly this was in part due to the immaturity of the cost modelling but this change essentially removes that flexibility to the point where there's no value in keeping `getMinTripCountTailFoldingThreshold()`?
> > > > >
> > > > > If your previous patches improve the cost model in this regard then I'd rather `getMinTripCountTailFoldingThreshold()` be removed. That said, @dtemirbulatov can you help here in ascertaining if this option is still required based on this new patch series?
> > > > I think you are missing one thing. Currently we allow vectorization of short trip count loops if there is no run-time overhead and the tail can be folded only. Introduction of getMinTripCountTailFoldingThreshold put additional limitations on vectorization of such loops. This change enables vectorization of short trip count loops with scalar epilogue ONLY If trip count is compile time constant. getMinTripCountTailFoldingThreshold is still applicable if we decide to vectorize by folding the tail.
> > > >
> > > > In other words, this change enables exactly the same way of vectorization for loops with small constant trip count as for loops with unknown or large trip count.
> > > >
> > > > It's a question whether getMinTripCountTailFoldingThreshold threshold should be taken into account if it's decided to fold the tail at step 5) for short trip count loop. Possibly yes... I think this is your original concern, right?
> > > Are you assuming that the loop will be unrolled after vectorization, and that unrolling will leave no extra overhead? Scalable vector loops cannot (currently) be unrolled.
> > Actually no. Even if we keep vector loop which runs just 1 or 2 iterations I don't see any additional overhead comparing to scalar loop. At least not on IR level. Is there any overhead specific to MVE not visible on IR level?
> The getMinTripCountTailFoldingThreshold was added for SVE, it is not (currently) used in MVE.
>
> There will be a certain amount of overhead just from having a second loop, let alone all the extra overhead from the gubbins around it. https://godbolt.org/z/s8xEf7d6K. None of that is particularly MVE specific though, although as it is designed for small energy-efficient cores it may feel the overheads more than other architectures.
It looks like, in this specific example, vector cost of the main vector loop being estimated as 2x lower than scalar one is pretty much accurate. The problem is that extra cost coming from horizontal reduction after main vector loop is not taken into account. This is something that definitely should be fixed before we can move on in this direction. Essentially, this problem is a major blocking factor to enable short TC loops vectorization in general case (not only compile time known TC).
PS: I wonder if your problematic case on MVE is something similar to this or not?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D115713/new/
https://reviews.llvm.org/D115713
More information about the llvm-commits
mailing list