[llvm] [LoopVectorize] Allow Early-Exit Loop Vectorization with EVL (PR #130918)

David Sherwood via llvm-commits llvm-commits at lists.llvm.org
Wed Mar 26 10:56:03 PDT 2025


================
@@ -4038,10 +4038,12 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
   }
 
   // The only loops we can vectorize without a scalar epilogue, are loops with
-  // a bottom-test and a single exiting block. We'd have to handle the fact
-  // that not every instruction executes on the last iteration.  This will
-  // require a lane mask which varies through the vector loop body.  (TODO)
-  if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
+  // a bottom-test and a single exiting block or those with early exits. We'd
+  // have to handle the fact that not every instruction executes on the last
+  // iteration. This will require a lane mask which varies through the vector
+  // loop body. (TODO)
+  if ((TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) &&
+      !Legal->hasUncountableEarlyExit()) {
----------------
david-arm wrote:

Also, I just tried out your patch and when we use tail-folding in combination with `get.active.lane.mask` for early exit loops the IR is broken. I modified `same_exit_block_pre_inc_use1` to have a trip count of 63 and removed outside uses of the induction variable so that I ended up with vectorised IR like this:

```
  %wide.masked.load2 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0(ptr %9, i32 1, <vscale x 16 x i1> %active.lane.mask, <vscale x 16 x i8> poison)
  %10 = icmp eq <vscale x 16 x i8> %wide.masked.load, %wide.masked.load2
  %11 = select <vscale x 16 x i1> %active.lane.mask, <vscale x 16 x i1> %10, <vscale x 16 x i1> zeroinitializer
  %index.next3 = add i64 %index1, %4
  %12 = xor <vscale x 16 x i1> %11, splat (i1 true)
  %13 = call i1 @llvm.vector.reduce.or.nxv16i1(<vscale x 16 x i1> %12)
  %active.lane.mask.next = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index.next3, i64 63)
  %14 = xor <vscale x 16 x i1> %active.lane.mask.next, splat (i1 true)
  %15 = extractelement <vscale x 16 x i1> %14, i32 0
  br i1 %15, label %middle.split, label %vector.body, !llvm.loop !0
```

The branch condition should be an `or` of `%13` and `%15`. Unfortunately, I think I have to request changes this PR for now. Sorry about that!

https://github.com/llvm/llvm-project/pull/130918


More information about the llvm-commits mailing list