[PATCH] D154261: [LV] Consider if scalar epilogue is required in getMaximizedVFForTarget.

Tue Jul 4 13:17:32 PDT 2023

Ayal added a comment.

In D154261#4471533 <https://reviews.llvm.org/D154261#4471533>, @fhahn wrote:

> In D154261#4471308 <https://reviews.llvm.org/D154261#4471308>, @Ayal wrote:
>
>> In D154261#4468894 <https://reviews.llvm.org/D154261#4468894>, @fhahn wrote:
>>
>>> Rebase on top of e561edaaa56c9a8818d546774b141dead7224b50 <https://reviews.llvm.org/rGe561edaaa56c9a8818d546774b141dead7224b50> which updates the test in pr56319-vector-exit-cond-optimization-epilogue-vectorization.ll so it keeps testing for the original issue.
>>
>> Ah, the original issue of pr56319 was to ensure that the epilog loop leaves at-least one iteration for the scalar loop (or actually epilogVF iterations) rather than vectorizing all remaining iterations? An odd trip count of 39 will always end up with a last scalar iteration, due to main and epilog VF's being even. Perhaps a trip count of 48 with (forced?) main-loop VF of 32 and epilog-loop VF of 8 should ensure the epilog loop runs only once.
>
> The original issue was that we simplified the branch in the main vector loop to exit in the first iteration (VF == vector TC) and because epilogue and main vector loops share the same VPlan, it was also simplified for the epilogue vector loop, but there where more than 1 vector iteration (fix was https://github.com/llvm/llvm-project/commit/0dddf04caba55a64f8534518d65311bdac05cf39).
>
> Now, the test still checks that we don't simplify neither the main nor epilogue vector loop branch. But in this case, we could simplify both, because the epilogue vector loop only executes a single iteration; we should also have a variant where the epilogue vector loops executes multiple times (e.g. main VF=32, epilogue VF=8, TC = 49), but unfortunately my attempts so far weren't successful, because there's some code in `isMoreProfitable` that estimates the cost of the vector loop + scalar tail for the given VFs, which ignores possible epilogue vectorization( https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp#L5384). For all cases I tried, this leads to a small main VF ends up being chose (8 in that case) if there would be more than a single iteration of the epilogue vector loop.

Ah, ok, that optimization was later outlined by having LVP::execute_plan() do `if (!IsEpilogueVectorization) VPlanTransforms::optimizeForVFAndUF())`.
(And may explain the observed clean-up behavior observed in https://reviews.llvm.org/D154264#inline-1491906 :-)

The explanation for test @pr56319 may be misleading, though, as it specifically deals with having the epilog loop vectorization consider requiresScalarEpilog:
; Test case where the exit condition in the main vector loop can be optimized
; to true, but not in the epilogue vector loop. **In the test the interleave
; group requires to execute at least one scalar iteration**, meaning the last
; vector iteration of the epilogue vector loop cannot be executed.

Could a test with trip count of, say, 34, work - be vectorized with main VF=32 and epilog VF=2 w/o requiresScalarEpilog - when last member of the interleave-group is accessed, while with requiresScalarEpilog main VF=32 will continue but epilog vectorization will be abandoned?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154261/new/

https://reviews.llvm.org/D154261