[PATCH] D115713: [LV] Don't apply "TinyTripCountVectorThreshold" for loops with compile time known TC.
Florian Hahn via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Oct 10 00:27:55 PDT 2022
fhahn added inline comments.
================
Comment at: llvm/test/Transforms/LoopVectorize/ARM/mve-known-trip-count.ll:7
-; Trip count of 5 - shouldn't be vectorized.
+; Trip count of 5 - vectorized with VF=4 plus one scalar iteration.
; CHECK-LABEL: tripcount5
----------------
dmgreen wrote:
> I don't think we want this - it is worse. At least that is what my benchmarks suggest.
>
> That was the point of D101726. 1 vector + 1 masked vector iteration when unrolled was worse than 5 scalar because of the overheads of vector instructions. 1 vector + 1 scalar will be in the same boat.
I also think we would probably need to make this target/CPU dependent. We also have some AArch64 CPUs where usually at least 2 vector iterations are needed to make the vector code profitable if there is a scalar tail.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D115713/new/
https://reviews.llvm.org/D115713
More information about the llvm-commits
mailing list