[PATCH] Reassociate GEP operands for loop invariant code motion

Mon Apr 20 22:39:11 PDT 2015

On Mon, Apr 20, 2015 at 10:13 PM, Jingyue Wu <jingyue at google.com> wrote:
> I ran `opt -scalar-evolution -analyze` on `simple_licm` and got
>
>   Printing analysis 'Scalar Evolution Analysis' for function 'simple_licm':
>   Classifying expressions for: @simple_licm
>     %i = phi i32 [ 0, %entry ], [ %i.next, %loop ]
>     -->  {0,+,1}<nuw><nsw><%loop> U: [0,1000000) S: [0,1000000)           Exits: 999999
>     %idx = add nsw i32 %a, %i
>     -->  {%a,+,1}<nw><%loop> U: full-set S: full-set              Exits: (999999 + %a)
>     %idx.sext = sext i32 %idx to i64
>     -->  (sext i32 {%a,+,1}<nw><%loop> to i64) U: [-2147483648,2147483648) S: [-2147483648,2147483648)            Exits: (sext i32 (999999 +
>   %a) to i64)
>     %arrayidx = getelementptr i32, i32* %input, i64 %idx.sext
>     -->  ((4 * (sext i32 {%a,+,1}<nw><%loop> to i64)) + %input) U: full-set S: full-set           Exits: ((4 * (sext i32 (999999 + %a) to i64
>   )) + %input)
>     %0 = load i32, i32* %arrayidx
>     -->  %0 U: full-set S: full-set               Exits: <<Unknown>>
>     %i.next = add nuw nsw i32 %i, 1
>     -->  {1,+,1}<nuw><nsw><%loop> U: [1,1000001) S: [1,1000001)           Exits: 1000000
>   Determining loop execution counts for: @simple_licm
>   Loop %loop: backedge-taken count is 999999
>   Loop %loop: max backedge-taken count is 999999
>
> As far as I can see,
>
>   %idx = add nsw i32 %a, %i
>   -->  {%a,+,1}<nw><%loop> U: full-set S: full-set              Exits: (999999 + %a)
>
> `%idx` has only `<nw>` (self-wrap) flag but not `<nsw>`.

As far as I can tell, this is a missing case in SCEV.  You were right
in saying that LSR (through SCEV) does not exploit nsw here.

> I noticed you recently worked on strengthening `ScalarEvolution`'s handling of nsw, which is pretty awesome! Does it address this issue?

As you can tell by empirical evidence, the answer is no. :)

The key issue here is that nsw/nuw/poison-values are on shaky ground
to begin with.  SCEV pretends that an nuw/nsw addition that feeds into
a PHI can be assumed to not overflow but depending on how you
interpret the semantics of nuw/nsw in the lang-ref, this may or may
not hold.  This is one reason why I've generally tried to make SCEV
more aggressive in inferring no-overflow bits directly from control
flow (i.e. even if nuw/nsw bits are not present in the IR).

The second reason I have had to make SCEV smart about FlagNSW/FlagNUW
in the absence of nsw/nuw in the IR is that the programming language
whose performance I'm interested in does not have UB on integer
overflow, so our frontend does not emit any nuw/nsw operations.  :)

-- Sanjoy