[PATCH] D40945: [ScalarEvolution] Improve high cost heuristic in SCEVExpander.

Mon Dec 18 10:23:48 PST 2017

bjope added inline comments.

================
Comment at: test/Transforms/IndVarSimplify/no-iv-rewrite.ll:228
 ; ScalarEvolution is able to understand the loop evolution and that '%iv' at the
-; end of the loop is an even value. Thus '%val' is computed at the end of the
-; loop and the OR instruction is replaced by an ADD keeping the result
-; equivalent.
+; end of the loop is an even value. However, the expression is too expensive to
+; compute as the loop is not guarded, so the OR is not cloned (and changed into
----------------
sanjoy wrote:
> @mzolotukhin - can you please review this bit (that the test update is fine)?
Here is some more input to the review (I promised Bevin that I would keep an eye on this during his vacation).
With this patch I get:

```
define i64 @cloneOr(i32 %limit, i64* %base) #0 {
entry:
  %halfLim = ashr i32 %limit, 2
  %0 = sext i32 %halfLim to i64
  br label %loop

loop:                                             ; preds = %loop, %entry
  %indvars.iv = phi i64 [ %indvars.iv.next, %loop ], [ 0, %entry ]
  %adr = getelementptr i64, i64* %base, i64 %indvars.iv
  %val = load i64, i64* %adr
  %1 = or i64 %indvars.iv, 1
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
  %cmp = icmp slt i64 %indvars.iv.next, %0
  br i1 %cmp, label %loop, label %exit

exit:                                             ; preds = %loop
  %val.lcssa = phi i64 [ %val, %loop ]
  %t3.lcssa = phi i64 [ %1, %loop ]
  %result = and i64 %val.lcssa, %t3.lcssa
  ret i64 %result
}
```
Without the patch:

```
define i64 @cloneOr(i32 %limit, i64* %base) #0 {
entry:
  %halfLim = ashr i32 %limit, 2
  %0 = sext i32 %halfLim to i64
  %1 = icmp sgt i32 %halfLim, 2
  %smax = select i1 %1, i32 %halfLim, i32 2
  %2 = add i32 %smax, -1
  %3 = lshr i32 %2, 1
  %4 = zext i32 %3 to i64
  %5 = shl i64 %4, 1
  br label %loop

loop:                                             ; preds = %loop, %entry
  %indvars.iv = phi i64 [ %indvars.iv.next, %loop ], [ 0, %entry ]
  %adr = getelementptr i64, i64* %base, i64 %indvars.iv
  %val = load i64, i64* %adr
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
  %cmp = icmp slt i64 %indvars.iv.next, %0
  br i1 %cmp, label %loop, label %exit

exit:                                             ; preds = %loop
  %val.lcssa = phi i64 [ %val, %loop ]
  %6 = add i64 %5, 1
  %result = and i64 %val.lcssa, %6
  ret i64 %result
}
```

It is not completely obvious to me which one is better (at least I find it hard to say that this test case shows that this patch is good).
Not having the `or` inside the loop is probably good in most cases (at least when iteration count is high). If iteration count is low, then the calculation of %result is simpler when we leave the `or` inside the loop.

https://reviews.llvm.org/D40945