[LLVMdev] missing optimization for icmps in induction variables?

Wed Jan 7 22:06:37 PST 2015

Sanjoy Das wrote:
> Hi all,
>
> I'm trying to get llvm to optimize away the %cmp to true in
>
> define i32 @foo(i32* %array, i32* %length_ptr, i32 %init) {
>   entry:
>    %length = load i32* %length_ptr, !range !0
>    %len.sub.1 = sub i32 %length, 1
>    %upper = icmp slt i32 %init, %len.sub.1
>    br i1 %upper, label %loop, label %exit
>
>   loop:
>    %civ = phi i32 [ %init, %entry ], [ %civ.inc, %latch ]
>    %civ.inc = add i32 %civ, 1
>    %cmp = icmp slt i32 %civ.inc, %length
>    br i1 %cmp, label %latch, label %break
>
>   latch:
>    store i32 0, i32* %array
>    %check = icmp slt i32 %civ.inc, %len.sub.1
>    br i1 %check, label %loop, label %break
>
>   break:
>    ret i32 %civ.inc
>
>   exit:
>    ret i32 42
> }
>
> !0 = !{i32 0, i32 2147483647}
>
>
> One way to prove "%cmp == true" in two steps
>
>   1. notice that since both on the backedge and entry, %civ is known to
>      be less than %len.sub.1, which not i32_signed_max.  This means
>      %civ.inc is an "add nsw".
>
>   2. on both the entry and backedge, we know "%civ `slt` %len.sub.1".
>      This implies "(%civ nsw+ 1) `slt` (%len.sub.1 nsw+ 1)" ==>
>      "%civ.inc `slt` %len".
>
> Currently neither of these happen (i.e. even if I make transformation
> (1) manually, (2) does not kick in).
>
> Is the above reasoning correct?  If it is, what is the right place to
> add this logic to?  I'm thinking ScalarEvolution (so that this gets
> picked up by SimplifyIndVar), but maybe there is a more idiomatic
> place?  The case above is a simplified form of my real workload, which
> involves smin and smax expressions; so the implementation has to be
> easily generalizable to such cases.

Before reading your two steps I was going to suggest jump threading. 
Jump threading is where we optimize redundant tests across blocks that 
feed into branches (block A tests property X and branches to block B 
which also tests property X). However jump threading is powered by lazy 
value info, which I don't think is suited for the sort of reasoning in 
your two steps.

One option is GVN. GVN does have x < y expressions but it doesn't try to 
deduce nuw/nsw bits. It might be possible to add that, but it isn't 
immediately obvious to me how. GVN also does path-sensitive expression 
commoning.

Nick