[llvm] [LoopVectorize] Allow Early-Exit Loop Vectorization with EVL (PR #130918)

Shih-Po Hung via llvm-commits llvm-commits at lists.llvm.org
Fri Mar 28 09:58:01 PDT 2025


================
@@ -4038,10 +4038,12 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
   }
 
   // The only loops we can vectorize without a scalar epilogue, are loops with
-  // a bottom-test and a single exiting block. We'd have to handle the fact
-  // that not every instruction executes on the last iteration.  This will
-  // require a lane mask which varies through the vector loop body.  (TODO)
-  if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
+  // a bottom-test and a single exiting block or those with early exits. We'd
+  // have to handle the fact that not every instruction executes on the last
+  // iteration. This will require a lane mask which varies through the vector
+  // loop body. (TODO)
+  if ((TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) &&
+      !Legal->hasUncountableEarlyExit()) {
----------------
arcbbb wrote:

Thanks for the detailed feedback—it's super helpful!
I am inspired by your patch (PR #120603) and would like to extend it by leveraging the vp.load.ff intrinsics (PR #128593) to safely handle out-of-bounds accesses. My goal is to introduce a new WidenFFLoad recipe, to enable vectorization of loops like std::find using EVL-based tail-folding.
You're right that canFoldTailByMasking() currently blocks this. Since EVL-based tail-folding doesn't mask the loop body, I'll probably need to relax that check specifically for EVL tail-folding. Also, the current VPLane::getAsRuntimeExpr() implementation, which uses ElementCount to calculate the last lane index, gets in the way too.
Good catch on the tests—I now realize they're currently using multiples of VF for the trip count, meaning the tail-folding paths aren't really being tested properly. I'll fix that up.


https://github.com/llvm/llvm-project/pull/130918


More information about the llvm-commits mailing list