[all-commits] [llvm/llvm-project] de111a: NFC: Use generate_test_checks script for LV tests ...
sdesmalen-arm via All-commits
all-commits at lists.llvm.org
Wed Mar 1 01:02:21 PST 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: de111ae70a5ae1878762bd38ec3d4f1e149a18f8
https://github.com/llvm/llvm-project/commit/de111ae70a5ae1878762bd38ec3d4f1e149a18f8
Author: Sander de Smalen <sander.desmalen at arm.com>
Date: 2023-03-01 (Wed, 01 Mar 2023)
Changed paths:
M llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll
M llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-reductions.ll
Log Message:
-----------
NFC: Use generate_test_checks script for LV tests which seem to have been auto-generated.
Commit: fe1b51ffee697642a8e636b597cba7ae9b809245
https://github.com/llvm/llvm-project/commit/fe1b51ffee697642a8e636b597cba7ae9b809245
Author: Sander de Smalen <sander.desmalen at arm.com>
Date: 2023-03-01 (Wed, 01 Mar 2023)
Changed paths:
M llvm/include/llvm/Analysis/TargetTransformInfo.h
M llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
M llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
M llvm/lib/Transforms/Vectorize/VPlan.h
M llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
M llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence-fold-tail.ll
M llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions-tf.ll
M llvm/test/Transforms/LoopVectorize/AArch64/sve-low-trip-count.ll
M llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll
M llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll
M llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-reductions.ll
M llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-unroll.ll
M llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll
M llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll
A llvm/test/Transforms/LoopVectorize/AArch64/tail-folding-styles.ll
Log Message:
-----------
[LoopVectorize] Remove runtime check and scalar tail loop when tail-folding.
When using tail-folding and using the predicate for both data and control-flow
(the next vector iteration's predicate is generated with the llvm.active.lane.mask
intrinsic and then tested for the backedge), the LoopVectorizer still inserts a
runtime check to see if the 'i + VF' may at any point overflow for the given
trip-count. When it does, it falls back to a scalar epilogue loop.
We can get rid of that runtime check in the pre-header and therefore also
remove the scalar epilogue loop. This reduces code-size and avoids a runtime
check.
Consider the following loop:
void foo(char * __restrict__ dst, char *src, unsigned long N) {
for (unsigned long i=0; i<N; ++i)
dst[i] = src[i] + 42;
}
If 'N' is e.g. ULONG_MAX, and the VF > 1, then the loop iteration counter
will overflow when calculating the predicate for the next vector iteration
at some point, because LLVM does:
vector.ph:
%active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N)
vector.body:
%index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
%active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ]
...
%index.next = add i64 %index, 16
; The add above may overflow, which would affect the lane mask and control flow. Hence a runtime check is needed.
%active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index.next, i64 %N)
%8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7
The solution:
What we can do instead is calculate the predicate before incrementing
the loop iteration counter, such that the llvm.active.lane.mask is
calculated from 'i' to 'tripcount > VF ? tripcount - VF : 0', i.e.
vector.ph:
%active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N)
%N_minus_VF = select %N > 16 ? %N - 16 : 0
vector.body:
%index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
%active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ]
...
%active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index, i64 %N_minus_VF)
%index.next = add i64 %index, %4
; The add above may still overflow, but this time the active.lane.mask is not affected
%8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7
For N = 20, we'd then get:
vector.ph:
%active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N)
; %active.lane.mask.entry = <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1>
%N_minus_VF = select 20 > 16 ? 20 - 16 : 0
; %N_minus_VF = 4
vector.body: (1st iteration)
... ; using <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1> as predicate in the loop
...
%active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 4)
; %active.lane.mask.next = <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>
%index.next = add i64 0, 16
; %index.next = 16
%8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
; %8 = 1
br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7
; branch to %vector.body
vector.body: (2nd iteration)
... ; using <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> as predicate in the loop
...
%active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 16, i64 4)
; %active.lane.mask.next = <0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>
%index.next = add i64 16, 16
; %index.next = 32
%8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
; %8 = 0
br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7
; branch to %for.cond.cleanup
Reviewed By: fhahn, david-arm
Differential Revision: https://reviews.llvm.org/D142109
Compare: https://github.com/llvm/llvm-project/compare/4d513bd9a542...fe1b51ffee69
More information about the All-commits
mailing list