[PATCH] D115713: [LV] Don't apply "TinyTripCountVectorThreshold" for loops with compile time known TC.

Mon Oct 10 00:27:55 PDT 2022

fhahn added inline comments.

================
Comment at: llvm/test/Transforms/LoopVectorize/ARM/mve-known-trip-count.ll:7

-; Trip count of 5 - shouldn't be vectorized.
+; Trip count of 5 - vectorized with VF=4 plus one scalar iteration.
 ; CHECK-LABEL: tripcount5
----------------
dmgreen wrote:
> I don't think we want this - it is worse. At least that is what my benchmarks suggest.
> 
> That was the point of D101726. 1 vector + 1 masked vector iteration when unrolled was worse than 5 scalar because of the overheads of vector instructions. 1 vector + 1 scalar will be in the same boat.
I also think we would probably need to make this target/CPU dependent. We also have some AArch64 CPUs where usually at least 2 vector iterations are needed to make the vector code profitable if there is a scalar tail.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115713/new/

https://reviews.llvm.org/D115713