[llvm] 091422a - [LSR] Fix wrapping bug in lsr-term-fold logic

Philip Reames via llvm-commits llvm-commits at lists.llvm.org
Mon Mar 20 13:47:33 PDT 2023


Author: Philip Reames
Date: 2023-03-20T13:47:21-07:00
New Revision: 091422adc1d7478b126a967c795414840c5c0c97

URL: https://github.com/llvm/llvm-project/commit/091422adc1d7478b126a967c795414840c5c0c97
DIFF: https://github.com/llvm/llvm-project/commit/091422adc1d7478b126a967c795414840c5c0c97.diff

LOG: [LSR] Fix wrapping bug in lsr-term-fold logic

The existing logic was unsound, in two ways.

First, due to wrapping on the trip count computation, it could compute a value which convert a loop which exiting on iteration 256, to one which exited at 255. (With i8 trip counts.)

Second, it allowed rewriting when the trip count implies wrapping around the alternate IV. As a trivial example, it allowed rewriting an i128 exit test in terms of an i64 IV. This is obviously wrong.

Note that the test change is fairly minimal - i.e. only the targeted test - but that's only because I precommitted a change which switched the test from 32 to 64 bit pointers. For 32 bit point architectures with 32 bit primary inductions, this transform is almost always unsound to perform.

Differential Revision: https://reviews.llvm.org/D146429

Added: 
    

Modified: 
    llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
    llvm/test/Transforms/LoopStrengthReduce/lsr-term-fold.ll

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
index e76ba2da2212..0a4d815e6720 100644
--- a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
@@ -6762,7 +6762,25 @@ canFoldTermCondOfLoop(Loop *L, ScalarEvolution &SE, DominatorTree &DT,
       continue;
     }
 
-    // FIXME: This does not properly account for overflow.
+    // Check that we can compute the value of AddRec on the exiting iteration
+    // without soundness problems.  There are two cases to be worried about:
+    // 1) BECount could be 255 with type i8.  Simply adding one would be
+    //    incorrect.  We may need one extra bit to represent the unsigned
+    //    trip count.
+    // 2) The multiplication of stride by TC may wrap around.  This is subtle
+    //    because computing the result accounting for wrap is insufficient.
+    //    In order to use the result in an exit test, we must also know that
+    //    AddRec doesn't take the same value on any previous iteration.
+    //    The simplest case to consider is a candidate IV which is narrower
+    //    than the trip count (and thus original IV), but this can also
+    //    happen due to non-unit strides on the candidate IVs.
+    ConstantRange StepCR = SE.getSignedRange(AddRec->getStepRecurrence(SE));
+    ConstantRange BECountCR = SE.getUnsignedRange(BECount);
+    unsigned NoOverflowBitWidth = BECountCR.getActiveBits() + 1 + StepCR.getMinSignedBits();
+    unsigned ARBitWidth = SE.getTypeSizeInBits(AddRec->getType());
+    if (NoOverflowBitWidth > ARBitWidth)
+      continue;
+
     const SCEV *TermValueSLocal = SE.getAddExpr(
         AddRec->getOperand(0),
         SE.getTruncateOrZeroExtend(

diff  --git a/llvm/test/Transforms/LoopStrengthReduce/lsr-term-fold.ll b/llvm/test/Transforms/LoopStrengthReduce/lsr-term-fold.ll
index a72e85979157..bb6b74ea8c38 100644
--- a/llvm/test/Transforms/LoopStrengthReduce/lsr-term-fold.ll
+++ b/llvm/test/Transforms/LoopStrengthReduce/lsr-term-fold.ll
@@ -192,20 +192,18 @@ for.end:                                          ; preds = %for.body
 ; In this case, the integer IV has a larger bitwidth than the pointer IV.
 ; This means that the smaller IV may wrap around multiple times before
 ; the original loop exit is taken.
-; FIXME: miscompile
 define void @iv_size(ptr %a, i128 %N) {
 ; CHECK-LABEL: @iv_size(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[TMP0:%.*]] = trunc i128 [[N:%.*]] to i64
-; CHECK-NEXT:    [[TMP1:%.*]] = shl i64 [[TMP0]], 2
-; CHECK-NEXT:    [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A:%.*]], i64 [[TMP1]]
 ; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
 ; CHECK:       for.body:
-; CHECK-NEXT:    [[LSR_IV1:%.*]] = phi ptr [ [[UGLYGEP2:%.*]], [[FOR_BODY]] ], [ [[A]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[LSR_IV1:%.*]] = phi ptr [ [[UGLYGEP2:%.*]], [[FOR_BODY]] ], [ [[A:%.*]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[LSR_IV:%.*]] = phi i128 [ [[LSR_IV_NEXT:%.*]], [[FOR_BODY]] ], [ [[N:%.*]], [[ENTRY]] ]
 ; CHECK-NEXT:    store i32 1, ptr [[LSR_IV1]], align 4
+; CHECK-NEXT:    [[LSR_IV_NEXT]] = add nsw i128 [[LSR_IV]], -1
 ; CHECK-NEXT:    [[UGLYGEP2]] = getelementptr i8, ptr [[LSR_IV1]], i64 4
-; CHECK-NEXT:    [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND:%.*]] = icmp eq ptr [[UGLYGEP2]], [[SCEVGEP]]
-; CHECK-NEXT:    br i1 [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i128 [[LSR_IV_NEXT]], 0
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
 ; CHECK:       for.end:
 ; CHECK-NEXT:    ret void
 ;


        


More information about the llvm-commits mailing list