[PATCH] D40945: [ScalarEvolution] Improve high cost heuristic in SCEVExpander.
Bjorn Pettersson via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Dec 18 10:23:48 PST 2017
bjope added inline comments.
================
Comment at: test/Transforms/IndVarSimplify/no-iv-rewrite.ll:228
; ScalarEvolution is able to understand the loop evolution and that '%iv' at the
-; end of the loop is an even value. Thus '%val' is computed at the end of the
-; loop and the OR instruction is replaced by an ADD keeping the result
-; equivalent.
+; end of the loop is an even value. However, the expression is too expensive to
+; compute as the loop is not guarded, so the OR is not cloned (and changed into
----------------
sanjoy wrote:
> @mzolotukhin - can you please review this bit (that the test update is fine)?
Here is some more input to the review (I promised Bevin that I would keep an eye on this during his vacation).
With this patch I get:
```
define i64 @cloneOr(i32 %limit, i64* %base) #0 {
entry:
%halfLim = ashr i32 %limit, 2
%0 = sext i32 %halfLim to i64
br label %loop
loop: ; preds = %loop, %entry
%indvars.iv = phi i64 [ %indvars.iv.next, %loop ], [ 0, %entry ]
%adr = getelementptr i64, i64* %base, i64 %indvars.iv
%val = load i64, i64* %adr
%1 = or i64 %indvars.iv, 1
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
%cmp = icmp slt i64 %indvars.iv.next, %0
br i1 %cmp, label %loop, label %exit
exit: ; preds = %loop
%val.lcssa = phi i64 [ %val, %loop ]
%t3.lcssa = phi i64 [ %1, %loop ]
%result = and i64 %val.lcssa, %t3.lcssa
ret i64 %result
}
```
Without the patch:
```
define i64 @cloneOr(i32 %limit, i64* %base) #0 {
entry:
%halfLim = ashr i32 %limit, 2
%0 = sext i32 %halfLim to i64
%1 = icmp sgt i32 %halfLim, 2
%smax = select i1 %1, i32 %halfLim, i32 2
%2 = add i32 %smax, -1
%3 = lshr i32 %2, 1
%4 = zext i32 %3 to i64
%5 = shl i64 %4, 1
br label %loop
loop: ; preds = %loop, %entry
%indvars.iv = phi i64 [ %indvars.iv.next, %loop ], [ 0, %entry ]
%adr = getelementptr i64, i64* %base, i64 %indvars.iv
%val = load i64, i64* %adr
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
%cmp = icmp slt i64 %indvars.iv.next, %0
br i1 %cmp, label %loop, label %exit
exit: ; preds = %loop
%val.lcssa = phi i64 [ %val, %loop ]
%6 = add i64 %5, 1
%result = and i64 %val.lcssa, %6
ret i64 %result
}
```
It is not completely obvious to me which one is better (at least I find it hard to say that this test case shows that this patch is good).
Not having the `or` inside the loop is probably good in most cases (at least when iteration count is high). If iteration count is low, then the calculation of %result is simpler when we leave the `or` inside the loop.
https://reviews.llvm.org/D40945
More information about the llvm-commits
mailing list