[PATCH] D103700: [LV] Fix bug when unrolling (only) a loop with non-latch exit

Philip Reames via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jun 23 16:01:08 PDT 2021


reames updated this revision to Diff 354097.
reames added a comment.

Attempt to incorporate review feedback.

@Ayal I still don't understand your comments on the review.  I think this was what you were asking for (right?), and it doesn't seem to do any harm, but I'm not sure why you think this was needed for correctness.  That doesn't seem to match the code.  If this was not what you had in mind, can I ask you to simply fix the bug proven by the test case yourself?  We don't appear to be making progress here in terms of me understanding the point you're getting at.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103700/new/

https://reviews.llvm.org/D103700

Files:
  llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
  llvm/test/Transforms/LoopVectorize/unroll_nonlatch.ll


Index: llvm/test/Transforms/LoopVectorize/unroll_nonlatch.ll
===================================================================
--- llvm/test/Transforms/LoopVectorize/unroll_nonlatch.ll
+++ llvm/test/Transforms/LoopVectorize/unroll_nonlatch.ll
@@ -27,7 +27,7 @@
 ; CHECK-NEXT:    [[TMP9:%.*]] = fneg double [[TMP7]]
 ; CHECK-NEXT:    store double [[TMP8]], double* [[TMP4]], align 8
 ; CHECK-NEXT:    store double [[TMP9]], double* [[TMP5]], align 8
-; CHECK-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], 2
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
 ; CHECK-NEXT:    [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1022
 ; CHECK-NEXT:    br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       middle.block:
Index: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
===================================================================
--- llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1566,14 +1566,14 @@
 
   /// Returns true if we're required to use a scalar epilogue for at least
   /// the final iteration of the original loop.
-  bool requiresScalarEpilogue() const {
+  bool requiresScalarEpilogue(ElementCount VF) const {
     if (!isScalarEpilogueAllowed())
       return false;
     // If we might exit from anywhere but the latch, must run the exiting
     // iteration in scalar form.
     if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch())
       return true;
-    return InterleaveInfo.requiresScalarEpilogue();
+    return VF.isVector() && InterleaveInfo.requiresScalarEpilogue();
   }
 
   /// Returns true if a scalar epilogue is not allowed due to optsize or a
@@ -3183,18 +3183,13 @@
   // unroll factor (number of SIMD instructions).
   Value *R = Builder.CreateURem(TC, Step, "n.mod.vf");
 
-  // There are two cases where we need to ensure (at least) the last iteration
-  // runs in the scalar remainder loop. Thus, if the step evenly divides
-  // the trip count, we set the remainder to be equal to the step. If the step
-  // does not evenly divide the trip count, no adjustment is necessary since
-  // there will already be scalar iterations. Note that the minimum iterations
-  // check ensures that N >= Step. The cases are:
-  // 1) If there is a non-reversed interleaved group that may speculatively
-  //    access memory out-of-bounds.
-  // 2) If any instruction may follow a conditionally taken exit. That is, if
-  //    the loop contains multiple exiting blocks, or a single exiting block
-  //    which is not the latch.
-  if (VF.isVector() && Cost->requiresScalarEpilogue()) {
+  // There are cases where we *must* run at least one iteration in the remainder
+  // loop.  See the cost model for when this can happen.  If the step evenly
+  // divides the trip count, we set the remainder to be equal to the step. If
+  // the step does not evenly divide the trip count, no adjustment is necessary
+  // since there will already be scalar iterations. Note that the minimum
+  // iterations check ensures that N >= Step.
+  if (Cost->requiresScalarEpilogue(VF)) {
     auto *IsZero = Builder.CreateICmpEQ(R, ConstantInt::get(R->getType(), 0));
     R = Builder.CreateSelect(IsZero, Step, R);
   }
@@ -3248,8 +3243,8 @@
   // vector trip count is zero. This check also covers the case where adding one
   // to the backedge-taken count overflowed leading to an incorrect trip count
   // of zero. In this case we will also jump to the scalar loop.
-  auto P = Cost->requiresScalarEpilogue() ? ICmpInst::ICMP_ULE
-                                          : ICmpInst::ICMP_ULT;
+  auto P = Cost->requiresScalarEpilogue(VF) ? ICmpInst::ICMP_ULE
+                                            : ICmpInst::ICMP_ULT;
 
   // If tail is to be folded, vector loop takes care of all iterations.
   Value *CheckMinIters = Builder.getFalse();
@@ -8324,7 +8319,7 @@
   // Generate code to check if the loop's trip count is less than VF * UF of the
   // main vector loop.
   auto P =
-      Cost->requiresScalarEpilogue() ? ICmpInst::ICMP_ULE : ICmpInst::ICMP_ULT;
+    Cost->requiresScalarEpilogue(VF) ? ICmpInst::ICMP_ULE : ICmpInst::ICMP_ULT;
 
   Value *CheckMinIters = Builder.CreateICmp(
       P, Count, ConstantInt::get(Count->getType(), VFactor * UFactor),
@@ -8468,7 +8463,7 @@
   // Generate code to check if the loop's trip count is less than VF * UF of the
   // vector epilogue loop.
   auto P =
-      Cost->requiresScalarEpilogue() ? ICmpInst::ICMP_ULE : ICmpInst::ICMP_ULT;
+    Cost->requiresScalarEpilogue(VF) ? ICmpInst::ICMP_ULE : ICmpInst::ICMP_ULT;
 
   Value *CheckMinIters = Builder.CreateICmp(
       P, Count,


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D103700.354097.patch
Type: text/x-patch
Size: 4694 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210623/7e2ce38e/attachment.bin>


More information about the llvm-commits mailing list