[llvm-dev] Question on induction variable simplification pass

Thu Apr 13 20:34:30 PDT 2017

Hi Pankaj,

On April 12, 2017 at 5:22:45 PM, Chawla, Pankaj via llvm-dev
(llvm-dev at lists.llvm.org) wrote:
> It looks like the induction variable simplification pass prefers doing a zero-extension
> to compute the wider trip count of loops when extending the IV. This can sometimes result
> in loss of information making ScalarEvolution's analysis conservative which can lead
> to missed performance opportunities.
>
> For example, consider this loopnest-
>
> int i, j;
> for(i=0; i< 40; i++)
> for(j=0; j< i-1; j++)
> A[i+j] = j;
>
>
> We are mainly interested in the backedge taken count of the inner loop.
>
> Before indvars, the backedge information computed by ScalarEvolution is as follows-
>
> Outer loop-
> backedge-taken count is 39
> max backedge-taken count is 39
>
> Inner loop-
> backedge-taken count is {-2,+,1}<%for.cond1.preheader>
> max backedge-taken count is 37
>
>
> After indvars, the backedge information computed by ScalarEvolution is as follows-
>
> Outer loop-
> backedge-taken count is 39
> max backedge-taken count is 39
>
> Inner loop-
> backedge-taken count is (-1 + (zext i32 {-1,+,1}<%for.cond1.preheader> to i64))
> max backedge-taken count is -1

One approach is to use the facts:

 - The inner loop will not be entered in the 0th iteration of
   <%for.cond1.preheader>
 - {-1,+,1}<%for.cond1.preheader> is s< 40

to simplify the above to {-2,+,1}<%for.cond1.preheader> (in i64).  The
original expression was not -2 in the 0th iteration of
<%for.cond1.preheader>, but we don't care about that iteration of
<%for.cond1.preheader> since we won't enter the inner loop.

The other option is to widen of IVs late, as a "lowering"
transformation, like Hal said.  That's a more invasive change, but if
you have time and resources, it would be nice to at least give it a
shot, measure and see what falls over.

> If we generate sext instead of zext, the information computed by ScalarEvolution is
> as follows-
>
> Outer loop-
> backedge-taken count is 39
> max backedge-taken count is 39
>
> Inner loop-
> backedge-taken count is {-2,+,1}<%for.cond1.preheader>
> max backedge-taken count is -2
>
> We now have a simplified backedge taken count for the inner loop which is almost as precise
> as before indvars, except for the flag instead of flag. I think ScalarEvolution

(JFYI: My mail client's compose ate the <nsw> and <nw>)

Can you please share the IR that you piped through SCEV?

My guess is that SCEV did not "try to" infer a more aggressive no-wrap
flag for {-2,+,1} -- most of the no-wrap inferring logic kicks in when
you try to sign/zero extend an add recurrence.

One suspicious bit here is the "max backedge-taken count is -2" bit.
I expected SCEV to have inferred the max trip count to be 37 in
this second case.

-- Sanjoy