[llvm-bugs] [Bug 32047] New: Program compiled at -O2 results in an infinite loop (Induction Variable Simplification)
via llvm-bugs
llvm-bugs at lists.llvm.org
Thu Feb 23 11:30:27 PST 2017
https://bugs.llvm.org/show_bug.cgi?id=32047
Bug ID: 32047
Summary: Program compiled at -O2 results in an infinite loop
(Induction Variable Simplification)
Product: new-bugs
Version: trunk
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: P
Component: new bugs
Assignee: unassignedbugs at nondot.org
Reporter: rob.lougher at gmail.com
CC: llvm-bugs at lists.llvm.org
The following program is a reduced test-case from auto-generated code:
=============================== foo.cpp ===============================
extern "C" int printf(const char *, ...);
static void init(unsigned char pred, volatile void *data, unsigned size) {
unsigned char *bytes = (unsigned char *)data;
for (unsigned int i = 0; i != size; ++i) {
bytes[i] = pred + i;
}
}
int main() {
unsigned char alpha;
init(255, &alpha, sizeof(alpha));
int bravo = 1;
printf("pre-loop %d\n", bravo);
for (unsigned char i = 0; i < alpha; ++i) {
bravo = 2;
}
printf("post-loop %d\n", bravo);
}
struct Charlie_ {
Charlie_() { init(48, &*this, sizeof(*this)); }
unsigned int a;
} Charlie;
=======================================================================
If we compile this program with -O1 it terminates. At -O2, however, it hangs:
$ clang --version
clang version 5.0.0 (trunk 295990)
Target: x86_64-unknown-linux-gnu
Thread model: posix
$ clang foo.cpp -O1 -o foo && ./foo
pre-loop 1
post-loop 2
$ clang foo.cpp -O2 -o foo && ./foo
pre-loop 1
< HANG >
If we look at the assembly output it is obvious why the program hangs as the
loop in main() has been replaced by a jmp to itself:
main: # @main
pushq %rax
movl $.L.str, %edi
movl $1, %esi
xorl %eax, %eax
callq printf
.p2align 4, 0x90
.LBB0_1: # %for.cond
# =>This Inner Loop Header: Depth=1
jmp .LBB0_1
There are several steps involved in getting to this position. This is the IR
for main() after inlining of init(). The loop for.body.i is from init, and
for.cond/for.body is the loop in main:
======================================================
define i32 @main() local_unnamed_addr #0 {
entry:
%alpha = alloca i8, align 1
call void @llvm.lifetime.start(i64 1, i8* nonnull %alpha) #4
br label %for.body.i
for.body.i: ; preds = %for.body.i, %entry
%indvars.iv.i = phi i64 [ 0, %entry ], [ %indvars.iv.next.i, %for.body.i ]
%0 = trunc i64 %indvars.iv.i to i8
%add.i = add i8 %0, -1
store i8 %add.i, i8* %alpha, align 1, !tbaa !2
%indvars.iv.next.i = add nuw nsw i64 %indvars.iv.i, 1
%cmp.i = icmp eq i64 %indvars.iv.i, 0
br i1 %cmp.i, label %_ZL4inithPVvj.exit, label %for.body.i
_ZL4inithPVvj.exit: ; preds = %for.body.i
%call = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([13 x
i8], [13 x i8]* @.str, i64 0, i64 0), i32 1)
br label %for.cond
for.cond: ; preds = %for.body,
%_ZL4inithPVvj.exit
%bravo.0 = phi i32 [ 1, %_ZL4inithPVvj.exit ], [ 2, %for.body ]
%i.0 = phi i8 [ 0, %_ZL4inithPVvj.exit ], [ %inc, %for.body ]
%1 = load i8, i8* %alpha, align 1, !tbaa !2
%cmp = icmp ult i8 %i.0, %1
br i1 %cmp, label %for.body, label %for.cond.cleanup
for.cond.cleanup: ; preds = %for.cond
%bravo.0.lcssa = phi i32 [ %bravo.0, %for.cond ]
%call2 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([14 x
i8], [14 x i8]* @.str.1, i64 0, i64 0), i32 %bravo.0.lcssa)
call void @llvm.lifetime.end(i64 1, i8* nonnull %alpha) #4
ret i32 0
for.body: ; preds = %for.cond
%inc = add nuw i8 %i.0, 1
br label %for.cond
}
======================================================
The first step is the loop from main() is "rotated" to fold the loop latch
(for.body) into for.cond:
======================================================
for.cond: ; preds = %for.cond,
%_ZL4inithPVvj.exit
%bravo.0 = phi i32 [ 1, %_ZL4inithPVvj.exit ], [ 2, %for.cond ]
%i.0 = phi i8 [ 0, %_ZL4inithPVvj.exit ], [ %inc, %for.cond ]
%1 = load i8, i8* %alpha, align 1, !tbaa !1
%cmp = icmp ult i8 %i.0, %1
%inc = add nuw i8 %i.0, 1
br i1 %cmp, label %for.cond, label %for.cond.cleanup
======================================================
Several other optimisations then occur, which hoists the load out of the loop,
and instruction combining then replaces the store/load leaving the comparison
against %add.i:
======================================================
for.cond: ; preds = %for.cond,
%_ZL4inithPVvj.exit
%bravo.0 = phi i32 [ 1, %_ZL4inithPVvj.exit ], [ 2, %for.cond ]
%i.0 = phi i8 [ 0, %_ZL4inithPVvj.exit ], [ %inc, %for.cond ]
%cmp = icmp ult i8 %i.0, %add.i
%inc = add nuw i8 %i.0, 1
br i1 %cmp, label %for.cond, label %for.cond.cleanup
======================================================
The next interesting step is "Induction Variable Simplification". This
canonicalises the compare into either == or !=. After loop rotation, the loop
increment (%inc) and the compare are now in the same basic-block. When
Induction Variable Simplification canonicalises the compare it prefers to
compare against the post-incremented value. To do this, it needs to compare
against the "exit count + 1". If we look at %add.i we see:
%add.i = add i8 %0, -1
Which means we can use %0 in the compare:
======================================================
for.cond: ; preds = %for.cond,
%_ZL4inithPVvj.exit
%bravo.0 = phi i32 [ 1, %_ZL4inithPVvj.exit ], [ 2, %for.cond ]
%i.0 = phi i8 [ 0, %_ZL4inithPVvj.exit ], [ %inc, %for.cond ]
%inc = add nuw i8 %i.0, 1
%exitcond = icmp ne i8 %inc, %0
br i1 %exitcond, label %for.cond, label %for.cond.cleanup
======================================================
Next there follows a series of steps optimising the first loop (for.body.i).
The end result is the loop is optimised away entirely, and %0 ends up as zero:
======================================================
define i32 @main() local_unnamed_addr #0 {
entry:
br label %_ZL4inithPVvj.exit
_ZL4inithPVvj.exit: ; preds = %entry
%0 = trunc i64 0 to i8
%call = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([13 x
i8], [13 x i8]* @.str, i64 0, i64 0), i32 1)
br label %for.cond
for.cond: ; preds = %for.cond,
%_ZL4inithPVvj.exit
%bravo.0 = phi i32 [ 1, %_ZL4inithPVvj.exit ], [ 2, %for.cond ]
%i.0 = phi i8 [ 0, %_ZL4inithPVvj.exit ], [ %inc, %for.cond ]
%inc = add nuw i8 %i.0, 1
%exitcond = icmp ne i8 %inc, %0
br i1 %exitcond, label %for.cond, label %for.cond.cleanup
for.cond.cleanup: ; preds = %for.cond
%bravo.0.lcssa = phi i32 [ %bravo.0, %for.cond ]
%call2 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([14 x
i8], [14 x i8]* @.str.1, i64 0, i64 0), i32 %bravo.0.lcssa)
ret i32 0
======================================================
We now have the conditions in place for the never-ending loop. Note that the
comparison is comparing %inc against %0, which is zero. The loop increment has
already been increased by 1, and as the add is flagged "no-unsigned-wrap" (nuw)
it cannot be zero (if overflow occurs the result is undefined). This means the
not-equals compare will always be true. Global Value Numbering removes the
compare and replaces its result in the branch with true:
======================================================
for.cond: ; preds = %for.cond, %entry
%bravo.0 = phi i32 [ 1, %entry ], [ 2, %for.cond ]
%i.0 = phi i8 [ 0, %entry ], [ %inc, %for.cond ]
%inc = add nuw i8 %i.0, 1
br i1 true, label %for.cond, label %for.cond.cleanup
======================================================
Finally, "Bit-Tracking Dead Code Elimination" removes the rest of the loop
leaving just the branch:
======================================================
for.cond: ; preds = %for.cond, %entry
br i1 true, label %for.cond, label %for.cond.cleanup
======================================================
The question is, which of the steps is incorrect? As far as I can see, this is
the "Induction Variable Simplification" step.
Before this step we have:
======================================================
for.cond: ; preds = %for.cond,
%_ZL4inithPVvj.exit
%bravo.0 = phi i32 [ 1, %_ZL4inithPVvj.exit ], [ 2, %for.cond ]
%i.0 = phi i8 [ 0, %_ZL4inithPVvj.exit ], [ %inc, %for.cond ]
%cmp = icmp ult i8 %i.0, %add.i
%inc = add nuw i8 %i.0, 1
br i1 %cmp, label %for.cond, label %for.cond.cleanup
======================================================
We know that %add.i is 255 (0 + -1). The comparison therefore tests the
pre-incremented value in the range [0, 255], and when it reaches 255 the
compare returns false and the loop terminates.
Afterwards, the compare tests the post-incremented value against zero. This
requires the add to wrap from 255 to 0, but the add is marked
"no-unsigned-wrap", which means the result of overflow is undefined. In this
case the transform should either not be done, or the "nuw" flag should be
removed.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20170223/f5a2d832/attachment.html>
More information about the llvm-bugs
mailing list