[llvm] [LV] Vectorize Epilogues for loops with small VF but high IC (PR #108190)

Mon Sep 23 08:29:22 PDT 2024

davemgreen wrote:

> Hi, thanks for having a look! Not sure I fully understand the question; From the existing code and name my reading was that it should be the maximum trip count, since for the main loop it's using getSmallConstantMaxTripCount, which describes itself as an "upper bound of the loop trip count".

Sorry - I meant to reply to this but it fell off my radar. What I mean it that the value returned from getSmallConstantMaxTripCount is usually either a known constant from the trip count, or something like 2147483647 if the loop count is unknown (or it can return 0 for larger induction variables). The assumption is that is a good approximation for the trip count of the loop, or it is high enough that the vector cost dominates the total. Looking at the code, it should maybe be using getSmallBestKnownTC.

If the trip count is a known constant then everything should be good, but if it is unknown but large this patch I think uses `VF-1` (either directly or from `2147483647 % VF` via SCEVs) to cost the epilog.  The real number of iterations will be something in the range 0 to VF-1, with 0 probably being relatively common. That means we will calculate the cost based on the worst-case epilog iteration count, which for small loops like this could skew the results. I would suspect that it probably makes the vectorizer choose a smaller factor for the epilog, so that it would pick `3*VF2 + 1*scalar` instead of `1*VF4` if VF/2 used instead.

It might well be that VF-1 works better than any other choice, it sounds like it will pick lower VFs which would be a benefit to more trip counts (up to a point). I was interested if you had tried any other alternatives.

(A lot of the VF above should be VF*UF).

https://github.com/llvm/llvm-project/pull/108190