[PATCH] D150873: [LoopVectorize] Consider interleaving when deciding if epilogue vectorisation is beneficial

Mon May 22 00:43:15 PDT 2023

fhahn added inline comments.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/interleaving-reduction.ll:76
+; INTERLEAVE-4-NEXT:    [[TMP21:%.*]] = getelementptr inbounds i32, ptr [[TMP20]], i32 0
+; INTERLEAVE-4-NEXT:    [[WIDE_LOAD15:%.*]] = load <2 x i32>, ptr [[TMP21]], align 1
+; INTERLEAVE-4-NEXT:    [[TMP22]] = add <2 x i32> [[VEC_PHI14]], [[WIDE_LOAD15]]
----------------
I think here it would probably be better to use VF=4 for the epilogue loop to use the full vector width. I think the existing logic to pick the epilogue vectorization factor picks the next lowest VF, which probably needs adjusting as well.

In addition to that, it would be good to verify the impact with some microbenchmarks (could be added here https://github.com/llvm/llvm-test-suite/tree/main/MicroBenchmarks/LoopVectorization)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150873/new/

https://reviews.llvm.org/D150873