[PATCH] D150873: [LoopVectorize] Consider interleaving when deciding if epilogue vectorisation is beneficial
Florian Hahn via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon May 22 00:43:15 PDT 2023
fhahn added inline comments.
================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/interleaving-reduction.ll:76
+; INTERLEAVE-4-NEXT: [[TMP21:%.*]] = getelementptr inbounds i32, ptr [[TMP20]], i32 0
+; INTERLEAVE-4-NEXT: [[WIDE_LOAD15:%.*]] = load <2 x i32>, ptr [[TMP21]], align 1
+; INTERLEAVE-4-NEXT: [[TMP22]] = add <2 x i32> [[VEC_PHI14]], [[WIDE_LOAD15]]
----------------
I think here it would probably be better to use VF=4 for the epilogue loop to use the full vector width. I think the existing logic to pick the epilogue vectorization factor picks the next lowest VF, which probably needs adjusting as well.
In addition to that, it would be good to verify the impact with some microbenchmarks (could be added here https://github.com/llvm/llvm-test-suite/tree/main/MicroBenchmarks/LoopVectorization)
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D150873/new/
https://reviews.llvm.org/D150873
More information about the llvm-commits
mailing list