<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - Program compiled at -O2 results in an infinite loop (Induction Variable Simplification)"

   href="https://bugs.llvm.org/show_bug.cgi?id=32047">32047</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Program compiled at -O2 results in an infinite loop (Induction Variable Simplification)

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>new-bugs

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>new bugs

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>rob.lougher@gmail.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr></table>

      <p>

        <div>

        <pre>The following program is a reduced test-case from auto-generated code:

=============================== foo.cpp ===============================

extern "C" int printf(const char *, ...);

static void init(unsigned char pred, volatile void *data, unsigned size) {

  unsigned char *bytes = (unsigned char *)data;

  for (unsigned int i = 0; i != size; ++i) {

    bytes[i] = pred + i;

  }

}

int main() {

  unsigned char alpha;

  init(255, &alpha, sizeof(alpha));

  int bravo = 1;

  printf("pre-loop %d\n", bravo);

  for (unsigned char i = 0; i < alpha; ++i) {

    bravo = 2;

  }

  printf("post-loop %d\n", bravo);

}

struct Charlie_ {

  Charlie_() { init(48, &*this, sizeof(*this)); }

  unsigned int a;

} Charlie;

=======================================================================

If we compile this program with -O1 it terminates.  At -O2, however, it hangs:

$ clang --version

clang version 5.0.0 (trunk 295990)

Target: x86_64-unknown-linux-gnu

Thread model: posix

$ clang foo.cpp -O1 -o foo && ./foo

pre-loop 1

post-loop 2

$ clang foo.cpp -O2 -o foo && ./foo

pre-loop 1

< HANG >

If we look at the assembly output it is obvious why the program hangs as the

loop in main() has been replaced by a jmp to itself:

main:                                   # @main

        pushq   %rax

        movl    $.L.str, %edi

        movl    $1, %esi

        xorl    %eax, %eax

        callq   printf

        .p2align        4, 0x90

.LBB0_1:                                # %for.cond

                                        # =>This Inner Loop Header: Depth=1

        jmp     .LBB0_1

There are several steps involved in getting to this position.  This is the IR

for main() after inlining of init().  The loop for.body.i is from init, and

for.cond/for.body is the loop in main: 

======================================================

define i32 @main() local_unnamed_addr #0 {

entry:

  %alpha = alloca i8, align 1

  call void @llvm.lifetime.start(i64 1, i8* nonnull %alpha) #4

  br label %for.body.i

for.body.i:                                       ; preds = %for.body.i, %entry

  %indvars.iv.i = phi i64 [ 0, %entry ], [ %indvars.iv.next.i, %for.body.i ]

  %0 = trunc i64 %indvars.iv.i to i8

  %add.i = add i8 %0, -1

  store i8 %add.i, i8* %alpha, align 1, !tbaa !2

  %indvars.iv.next.i = add nuw nsw i64 %indvars.iv.i, 1

  %cmp.i = icmp eq i64 %indvars.iv.i, 0

  br i1 %cmp.i, label %_ZL4inithPVvj.exit, label %for.body.i

_ZL4inithPVvj.exit:                               ; preds = %for.body.i

  %call = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([13 x

i8], [13 x i8]* @.str, i64 0, i64 0), i32 1)

  br label %for.cond

for.cond:                                         ; preds = %for.body,

%_ZL4inithPVvj.exit

  %bravo.0 = phi i32 [ 1, %_ZL4inithPVvj.exit ], [ 2, %for.body ]

  %i.0 = phi i8 [ 0, %_ZL4inithPVvj.exit ], [ %inc, %for.body ]

  %1 = load i8, i8* %alpha, align 1, !tbaa !2

  %cmp = icmp ult i8 %i.0, %1

  br i1 %cmp, label %for.body, label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond

  %bravo.0.lcssa = phi i32 [ %bravo.0, %for.cond ]

  %call2 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([14 x

i8], [14 x i8]* @.str.1, i64 0, i64 0), i32 %bravo.0.lcssa)

  call void @llvm.lifetime.end(i64 1, i8* nonnull %alpha) #4

  ret i32 0

for.body:                                         ; preds = %for.cond

  %inc = add nuw i8 %i.0, 1

  br label %for.cond

}

======================================================

The first step is the loop from main() is "rotated" to fold the loop latch

(for.body) into for.cond:

======================================================

for.cond:                                         ; preds = %for.cond,

%_ZL4inithPVvj.exit

  %bravo.0 = phi i32 [ 1, %_ZL4inithPVvj.exit ], [ 2, %for.cond ]

  %i.0 = phi i8 [ 0, %_ZL4inithPVvj.exit ], [ %inc, %for.cond ]

  %1 = load i8, i8* %alpha, align 1, !tbaa !1

  %cmp = icmp ult i8 %i.0, %1

  %inc = add nuw i8 %i.0, 1

  br i1 %cmp, label %for.cond, label %for.cond.cleanup

======================================================

Several other optimisations then occur, which hoists the load out of the loop,

and instruction combining then replaces the store/load leaving the comparison

against %add.i:

======================================================

for.cond:                                         ; preds = %for.cond,

%_ZL4inithPVvj.exit

  %bravo.0 = phi i32 [ 1, %_ZL4inithPVvj.exit ], [ 2, %for.cond ]

  %i.0 = phi i8 [ 0, %_ZL4inithPVvj.exit ], [ %inc, %for.cond ]

  %cmp = icmp ult i8 %i.0, %add.i

  %inc = add nuw i8 %i.0, 1

  br i1 %cmp, label %for.cond, label %for.cond.cleanup

======================================================

The next interesting step is "Induction Variable Simplification".  This

canonicalises the compare into either == or !=.  After loop rotation, the loop

increment (%inc) and the compare are now in the same basic-block.  When

Induction Variable Simplification canonicalises the compare it prefers to

compare against the post-incremented value. To do this, it needs to compare

against the "exit count + 1". If we look at %add.i we see:

  %add.i = add i8 %0, -1

Which means we can use %0 in the compare:

======================================================

for.cond:                                         ; preds = %for.cond,

%_ZL4inithPVvj.exit

  %bravo.0 = phi i32 [ 1, %_ZL4inithPVvj.exit ], [ 2, %for.cond ]

  %i.0 = phi i8 [ 0, %_ZL4inithPVvj.exit ], [ %inc, %for.cond ]

  %inc = add nuw i8 %i.0, 1

  %exitcond = icmp ne i8 %inc, %0

  br i1 %exitcond, label %for.cond, label %for.cond.cleanup

======================================================

Next there follows a series of steps optimising the first loop (for.body.i). 

The end result is the loop is optimised away entirely, and %0 ends up as zero:

======================================================

define i32 @main() local_unnamed_addr #0 {

entry:

  br label %_ZL4inithPVvj.exit

_ZL4inithPVvj.exit:                               ; preds = %entry

  %0 = trunc i64 0 to i8

  %call = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([13 x

i8], [13 x i8]* @.str, i64 0, i64 0), i32 1)

  br label %for.cond

for.cond:                                         ; preds = %for.cond,

%_ZL4inithPVvj.exit

  %bravo.0 = phi i32 [ 1, %_ZL4inithPVvj.exit ], [ 2, %for.cond ]

  %i.0 = phi i8 [ 0, %_ZL4inithPVvj.exit ], [ %inc, %for.cond ]

  %inc = add nuw i8 %i.0, 1

  %exitcond = icmp ne i8 %inc, %0

  br i1 %exitcond, label %for.cond, label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond

  %bravo.0.lcssa = phi i32 [ %bravo.0, %for.cond ]

  %call2 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([14 x

i8], [14 x i8]* @.str.1, i64 0, i64 0), i32 %bravo.0.lcssa)

  ret i32 0

======================================================

We now have the conditions in place for the never-ending loop.  Note that the

comparison is comparing %inc against %0, which is zero.  The loop increment has

already been increased by 1, and as the add is flagged "no-unsigned-wrap" (nuw)

it cannot be zero (if overflow occurs the result is undefined).  This means the

not-equals compare will always be true.  Global Value Numbering removes the

compare and replaces its result in the branch with true:

======================================================

for.cond:                                         ; preds = %for.cond, %entry

  %bravo.0 = phi i32 [ 1, %entry ], [ 2, %for.cond ]

  %i.0 = phi i8 [ 0, %entry ], [ %inc, %for.cond ]

  %inc = add nuw i8 %i.0, 1

  br i1 true, label %for.cond, label %for.cond.cleanup

======================================================

Finally, "Bit-Tracking Dead Code Elimination" removes the rest of the loop

leaving just the branch:

======================================================

for.cond:                                         ; preds = %for.cond, %entry

  br i1 true, label %for.cond, label %for.cond.cleanup

======================================================

The question is, which of the steps is incorrect?  As far as I can see, this is

the "Induction Variable Simplification" step. 

Before this step we have:

======================================================

for.cond:                                         ; preds = %for.cond,

%_ZL4inithPVvj.exit

  %bravo.0 = phi i32 [ 1, %_ZL4inithPVvj.exit ], [ 2, %for.cond ]

  %i.0 = phi i8 [ 0, %_ZL4inithPVvj.exit ], [ %inc, %for.cond ]

  %cmp = icmp ult i8 %i.0, %add.i

  %inc = add nuw i8 %i.0, 1

  br i1 %cmp, label %for.cond, label %for.cond.cleanup

======================================================

We know that %add.i is 255 (0 + -1).  The comparison therefore tests the

pre-incremented value in the range [0, 255], and when it reaches 255 the

compare returns false and the loop terminates.

Afterwards, the compare tests the post-incremented value against zero.  This

requires the add to wrap from 255 to 0, but the add is marked

"no-unsigned-wrap", which means the result of overflow is undefined.  In this

case the transform should either not be done, or the "nuw" flag should be

removed.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>