[PATCH] D150873: [LoopVectorize] Consider interleaving when deciding if epilogue vectorisation is beneficial

Florian Hahn via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon May 22 00:43:15 PDT 2023


fhahn added inline comments.


================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/interleaving-reduction.ll:76
+; INTERLEAVE-4-NEXT:    [[TMP21:%.*]] = getelementptr inbounds i32, ptr [[TMP20]], i32 0
+; INTERLEAVE-4-NEXT:    [[WIDE_LOAD15:%.*]] = load <2 x i32>, ptr [[TMP21]], align 1
+; INTERLEAVE-4-NEXT:    [[TMP22]] = add <2 x i32> [[VEC_PHI14]], [[WIDE_LOAD15]]
----------------
I think here it would probably be better to use VF=4 for the epilogue loop to use the full vector width. I think the existing logic to pick the epilogue vectorization factor picks the next lowest VF, which probably needs adjusting as well.

In addition to that, it would be good to verify the impact with some microbenchmarks (could be added here https://github.com/llvm/llvm-test-suite/tree/main/MicroBenchmarks/LoopVectorization)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150873/new/

https://reviews.llvm.org/D150873



More information about the llvm-commits mailing list