[PATCH] D109296: [LV] Improve inclusivity of vectorization

Sat Sep 25 13:16:44 PDT 2021

fhahn added a comment.

In D109296#3022665 <https://reviews.llvm.org/D109296#3022665>, @lebedev.ri wrote:

> In D109296#3022656 <https://reviews.llvm.org/D109296#3022656>, @fhahn wrote:
>
>> In D109296#3022646 <https://reviews.llvm.org/D109296#3022646>, @lebedev.ri wrote:
>>
>>>> There's another different option potentially allows us to actually side-step the issue of not knowing the trip count. Based on the formula used in the original version of D109368 <https://reviews.llvm.org/D109368>, we can compute the minimum trip-count required for the vector loop to be profitable. We can also compute a minimum trip count so that the cost of the runtime-check is only a fraction of the total scalar loop cost. We already emit a minimum iteration check which can be adjusted with the additional computed minimums. I think that would allow us to vectorize a lot more aggressively, while still guarding against runtime checks adding a large overhead if they fail for low trip count loops. I updated D109368 <https://reviews.llvm.org/D109368> accordingly.
>>>
>>> Ok, please check if i got this right: instead of having a hard compile-time cut-off for the checks, which we believe is used to guard against compile-time (and file-size) explosion,
>>> we completely drop this limit, always vectorize, but before doing the run-time checks, we perform the trip-count checks, and if it fails, we fallback to scalar loop?
>>
>> Yes, that should be it basically (we also skip vectorization *if* we already know that the expected trip count is less than the computed minimums; this should also include profile info). This approach was the result of an offline discussion with @Ayal.
>
> This sounds truly awesome.
>
>>> That way we incur compile-time cost, filesize bloat, but not run-time cost.
>>
>> Yep, but I don't think compile-time cost and code size increases are much to worry about and are not the original motivation for the cutoff; too many runtime checks only prevents vectorization of 1% of otherwise vectorized loops in SPEC2006/SPEC2017/MultiSource with -O3. And when optimizing for size we currently do not allow runtime checks anyways.
>
> Well okay then :)
> So i guess what i need to to is to rebase this patch ontop of D109368 <https://reviews.llvm.org/D109368>, and simply methodically exterminate! exterminate! the compile-time limits instead of redesigning them, correct?

For LV, the threshold should have been already be removed, but it doesn't move `RuntimeMemoryCheckThreshold`, which is a main difference to this patch AFAICT :) It would be great if you could verify it works as expected with rawspeed.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109296/new/

https://reviews.llvm.org/D109296