[PATCH] D89566: [LV] Epilogue Vectorization with Optimal Control Flow

Fri Nov 27 07:37:12 PST 2020

bmahjour added inline comments.

================
Comment at: llvm/test/Transforms/LoopVectorize/ARM/pointer_iv.ll:292

 define hidden void @pointer_phi_v16i8_add1(i8* noalias nocapture readonly %A, i8* noalias nocapture %B, i32 %y) {
 ; CHECK-LABEL: @pointer_phi_v16i8_add1(
----------------
dmgreen wrote:
> I was surprised to see an MVE test like this chose to try and epilogue vectorize. I had presumed that would not happen on MVE - we only have a single vector width with no interleaving - the benefit of trying to do a single <8 x i8> iterations after a <16 x i8> main loop is not going to be worth the additional branching/setup we have to do, unfortunately.  I ran some extra tests and added a mve-qabs.ll test, where again the <16 x i8> loop is getting a remainder where it isn't beneficial.
> 
> I don't believe that MVE is a vector target that would ever benefit from epilogue vectorization, unfortunately. Can we get some sort of target hook that allows us to disable it? Perhaps something that sets a maximum epilogue vectorization factor given a VF * UF main loop? That would allow us to set it to none, whilst others tune it for their needs, like possibly always having the fallback as a 64bit vector under aarch64 (just a though, not sure if that's best idea or not but it at least allows targets to tune things).
> I ran some extra tests and added a mve-qabs.ll test, where again the <16 x i8> loop is getting a remainder where it isn't beneficial.

Is it degrading performance or just not beneficial (harmless)? As I mentioned before the heuristic in this patch is not very good, but putting the cost-modeling in the critical path for getting the codegen implemented is also not desirable. I had suggested to disable this transformation by default until a proper cost-model is implemented, to which some people disagreed.

In order to come up with a meaningful target hook it would be helpful to know what machine characteristics in MVE cause epilogue vectorization to not be beneficial. Are there existing TTI hooks that we can use (eg. `getMaxInterleaveFactor() > 1`)?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89566/new/

https://reviews.llvm.org/D89566