[llvm] [LoopVectorize] Allow Early-Exit Loop Vectorization with EVL (PR #130918)
David Sherwood via llvm-commits
llvm-commits at lists.llvm.org
Wed Mar 26 10:56:03 PDT 2025
================
@@ -4038,10 +4038,12 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
}
// The only loops we can vectorize without a scalar epilogue, are loops with
- // a bottom-test and a single exiting block. We'd have to handle the fact
- // that not every instruction executes on the last iteration. This will
- // require a lane mask which varies through the vector loop body. (TODO)
- if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
+ // a bottom-test and a single exiting block or those with early exits. We'd
+ // have to handle the fact that not every instruction executes on the last
+ // iteration. This will require a lane mask which varies through the vector
+ // loop body. (TODO)
+ if ((TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) &&
+ !Legal->hasUncountableEarlyExit()) {
----------------
david-arm wrote:
Also, I just tried out your patch and when we use tail-folding in combination with `get.active.lane.mask` for early exit loops the IR is broken. I modified `same_exit_block_pre_inc_use1` to have a trip count of 63 and removed outside uses of the induction variable so that I ended up with vectorised IR like this:
```
%wide.masked.load2 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr %9, i32 1, <vscale x 16 x i1> %active.lane.mask, <vscale x 16 x i8> poison)
%10 = icmp eq <vscale x 16 x i8> %wide.masked.load, %wide.masked.load2
%11 = select <vscale x 16 x i1> %active.lane.mask, <vscale x 16 x i1> %10, <vscale x 16 x i1> zeroinitializer
%index.next3 = add i64 %index1, %4
%12 = xor <vscale x 16 x i1> %11, splat (i1 true)
%13 = call i1 @llvm.vector.reduce.or.nxv16i1(<vscale x 16 x i1> %12)
%active.lane.mask.next = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index.next3, i64 63)
%14 = xor <vscale x 16 x i1> %active.lane.mask.next, splat (i1 true)
%15 = extractelement <vscale x 16 x i1> %14, i32 0
br i1 %15, label %middle.split, label %vector.body, !llvm.loop !0
```
The branch condition should be an `or` of `%13` and `%15`. Unfortunately, I think I have to request changes this PR for now. Sorry about that!
https://github.com/llvm/llvm-project/pull/130918
More information about the llvm-commits
mailing list