[PATCH] D142109: [LoopVectorize] Remove runtime check and scalar tail loop when tail-folding.
David Sherwood via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 25 01:45:39 PST 2023
david-arm added a comment.
This is a nice patch @sdesmalen! It's great to remove the overflow checks and scalar loop - a double win for performance and code size. :) I just had a comment about ways we can avoid overflow checks completely for low trip counts.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8747
+
+ auto *TCMinusVF =
+ new VPInstruction(VPInstruction::CalculateTripCountMinusVF, {TC}, DL);
----------------
I wonder if it's possible to be avoid this extra work in the preheader if we know that we can never overflow? For example, I'm thinking of a loop like this:
void foo(int* __restrict__ dst, int* __restrict__ src) {
for (long i = 0; i < 7; i++) {
dst[i] = 1000 / src[i];
}
}
where we will vectorise with tail-folding (assuming the unroller hasn't already destroyed the vectorisation opportunity!). For vscale=2 this is actually just a single iteration, so the additional work in the preheader becomes more significant.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D142109/new/
https://reviews.llvm.org/D142109
More information about the llvm-commits
mailing list