[PATCH] D142109: [LoopVectorize] Remove runtime check and scalar tail loop when tail-folding.

Mon Jan 30 05:30:49 PST 2023

sdesmalen added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8747
+
+    auto *TCMinusVF =
+        new VPInstruction(VPInstruction::CalculateTripCountMinusVF, {TC}, DL);
----------------
david-arm wrote:
> I wonder if it's possible to be avoid this extra work in the preheader if we know that we can never overflow? For example, I'm thinking of a loop like this:
> 
>   void foo(int* __restrict__ dst, int* __restrict__ src) {
>     for (long i = 0; i < 7; i++) {
>       dst[i] = 1000 / src[i];
>     }
>   }
> 
> where we will vectorise with tail-folding (assuming the unroller hasn't already destroyed the vectorisation opportunity!). For vscale=2 this is actually just a single iteration, so the additional work in the preheader becomes more significant.
That's a good point! For instances where we know no overflow can occur, it's probably better to avoid this style of tail-folding. It seems that GCC follows a similar approach.
I'll post a separate patch for that so that this patch remains relatively simple to review.

I've already tried to simplify things a bit by distinguishing `DataAndControlFlow` and `DataAndControlFlowWithoutRuntimeCheck`, so that we can easily fall back on the former if we know the runtime check can be folded away.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142109/new/

https://reviews.llvm.org/D142109