[llvm-dev] clang10 mis-compiles simple C program transpiled from brainfxxk

Haoran Xu via llvm-dev llvm-dev at lists.llvm.org
Wed Oct 21 22:32:02 PDT 2020


I was just able to determine the offending IR code before and after the
transformation. I'm now almost certain it's a bug in LLVM.

Before transformation, we have the following IR (I renamed all %xxx for
brevity):

> %1 = load i8, i8* %0, align 1
> %2 = add i8 %1, -1
> store i8 %2, i8* %0, align 1
>
The above IR is inside a loop, so the value in %0 can be different in each
run.

The optimization pass changed the IR above to the following:

> store i8 %3, i8* %0, align 1
>
where %3 is defined by

> %4 = load i8, i8* %0, align 1
> %3 = add i8 %4, -1
>
in an earlier piece of IR.

Apparently the pass treated %3 the same thing as %2 and it fired CSE,
without realizing that the content in %0 may have been changed by the loop.


David Blaikie <dblaikie at gmail.com> 于2020年10月21日周三 下午10:18写道:

> Might be worth running the c source file through creduce or similar to
> narrow it down a bit that way too.
>
> On Wed, Oct 21, 2020 at 9:12 PM Haoran Xu via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> A further bisect using opt's -opt-bisect-limit option shows that the
>> following pass is causing the issue:
>>
>>> BISECT: running pass (39) Early CSE w/ MemorySSA on function (main)
>>>
>>
>>
>> Haoran Xu <haoranxu510 at gmail.com> 于2020年10月21日周三 下午9:00写道:
>>
>>> I did a simple bisect on clang version, and it seems like clang 8.0.0
>>> works correctly, but clang 9.0.0 failed to compile the code correctly.
>>> https://godbolt.org/z/676Grr  <- if you change the clang version to
>>> 8.0.0, you will see the expected output in 'output' section.
>>> I don't have the ability to bisect on clang git history. I would greatly
>>> appreciate it if any one is willing to do that.
>>>
>>> Thanks!
>>>
>>> Haoran Xu <haoranxu510 at gmail.com> 于2020年10月21日周三 下午8:47写道:
>>>
>>>> Hello,
>>>>
>>>> I'm really amazed to find out that under -O3, a simple piece of C code
>>>> generated from a brainfxxk-to-C transpiler is miscompiled.
>>>> As one probably know, the C code transpiled from brainfxxk only
>>>> contains 3 kind of statements:
>>>>
>>>>> (1) ++(*ptr) / --(*ptr)
>>>>> (2) ++ptr / --ptr
>>>>> (3) while (*ptr) { ... }
>>>>>
>>>> where ptr is a uint8_t*.
>>>> So it seems very clear to me that the code contains no undefined
>>>> behavior (the pointer is uint8_t* and unsigned integer overflow is not UD).
>>>>
>>>> After further investigation, it seems like clang compiled this loop:
>>>>
>>>>> while (*ptr) {
>>>>>  --(*ptr);
>>>>>  ++ptr;
>>>>>  ++(*ptr);
>>>>>  --ptr;
>>>>> }
>>>>>
>>>>  to an unconditional infinite loop under -O3, resulting in the bug. The
>>>> code snippet above seems completely benign to me.
>>>>
>>>> I attached the offending program. With
>>>>
>>>>> clang a.c -O0
>>>>>
>>>> it worked fine (it should print out an ASCII-art picture of mandelbrot
>>>> fracture). However, with -O1 or -O3, it goes into a dead loop (in the code
>>>> snippet above) after printing out a few characters.
>>>>
>>>> I also tried UndefinedBehaviorSanitizer. Strangely, when compiling
>>>> using
>>>>
>>>>> clang a.c -O3  -fsanitize=undefined
>>>>>
>>>> the code worked again, with no infinite loop, and no undefined behavior
>>>> reported.
>>>>
>>>> So it seems to me a LLVM optimizer bug. I would greatly appreciate if
>>>> any one is willing to investigate.
>>>>
>>>> Best,
>>>> Haoran
>>>>
>>>>
>>>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201021/d7494aa6/attachment.html>


More information about the llvm-dev mailing list