[PATCH] D142109: [LoopVectorize] Remove runtime check and scalar tail loop when tail-folding.
Sander de Smalen via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jan 30 05:30:49 PST 2023
sdesmalen added inline comments.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8747
+
+ auto *TCMinusVF =
+ new VPInstruction(VPInstruction::CalculateTripCountMinusVF, {TC}, DL);
----------------
david-arm wrote:
> I wonder if it's possible to be avoid this extra work in the preheader if we know that we can never overflow? For example, I'm thinking of a loop like this:
>
> void foo(int* __restrict__ dst, int* __restrict__ src) {
> for (long i = 0; i < 7; i++) {
> dst[i] = 1000 / src[i];
> }
> }
>
> where we will vectorise with tail-folding (assuming the unroller hasn't already destroyed the vectorisation opportunity!). For vscale=2 this is actually just a single iteration, so the additional work in the preheader becomes more significant.
That's a good point! For instances where we know no overflow can occur, it's probably better to avoid this style of tail-folding. It seems that GCC follows a similar approach.
I'll post a separate patch for that so that this patch remains relatively simple to review.
I've already tried to simplify things a bit by distinguishing `DataAndControlFlow` and `DataAndControlFlowWithoutRuntimeCheck`, so that we can easily fall back on the former if we know the runtime check can be folded away.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D142109/new/
https://reviews.llvm.org/D142109
More information about the llvm-commits
mailing list