[llvm-dev] clang10 mis-compiles simple C program transpiled from brainfxxk
Nikita Popov via llvm-dev
llvm-dev at lists.llvm.org
Thu Oct 22 06:41:21 PDT 2020
I suspect that you found the same issue as
https://github.com/llvm/llvm-project/commit/ddd0f083184feb489684f87cf28d5eb10e0a244e.
We're trying to find a solution to that, but it's not entirely simple.
Nikita
On Thu, Oct 22, 2020 at 3:33 PM Haoran Xu via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> I managed to bisected out the diff introducing the bug:
> https://github.com/llvm/llvm-project/commit/bfc779e491099213d74c057b5727bde976c7ba02
>
> And I managed to figured out a fix: it seems (to me) that the diff is
> mostly a code refactoring diff, but the author accidentally removed two
> lines of functional logic, resulting in the bug.
> After adding the two lines back, the miscompiled code works fine now.
> However, I'm not certain of the performance implications of clearing the
> whole AAQI.
>
>> diff --git a/llvm/lib/Analysis/BasicAliasAnalysis.cpp
>> b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
>> index 232132958f7..9a8ca46093d 100644
>> --- a/llvm/lib/Analysis/BasicAliasAnalysis.cpp
>> +++ b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
>> @@ -836,6 +836,8 @@ AliasResult BasicAAResult::alias(const MemoryLocation
>> &LocA,
>> AliasResult Alias = aliasCheck(LocA.Ptr, LocA.Size, LocA.AATags,
>> LocB.Ptr,
>> LocB.Size, LocB.AATags, AAQI);
>>
>> + AAQI.AliasCache.shrink_and_clear();
>> + AAQI.IsCapturedCache.shrink_and_clear();
>> VisitedPhiBBs.clear();
>> return Alias;
>> }
>>
>
> I also noticed that the author forgot to change a few function calls to
> use his new API, but I'm not sure if it is a correctness issue or not. At
> least it's not causing the brainfxxk code miscompilation.
>
>> diff --git a/llvm/lib/Analysis/AliasAnalysis.cpp
>> b/llvm/lib/Analysis/AliasAnalysis.cpp
>> index 06a33f659c1..94a9d12eb17 100644
>> --- a/llvm/lib/Analysis/AliasAnalysis.cpp
>> +++ b/llvm/lib/Analysis/AliasAnalysis.cpp
>> @@ -231,7 +231,7 @@ ModRefInfo AAResults::getModRefInfo(const CallBase
>> *Call,
>>
>> // If Loc is a constant memory location, the call definitely could not
>> // modify the memory location.
>> - if (isModSet(Result) && pointsToConstantMemory(Loc, /*OrLocal*/ false))
>> + if (isModSet(Result) && pointsToConstantMemory(Loc, AAQI, /*OrLocal*/
>> false))
>> Result = clearMod(Result);
>>
>> return Result;
>> @@ -308,7 +308,7 @@ ModRefInfo AAResults::getModRefInfo(const CallBase
>> *Call1,
>>
>> // ModRefC1 indicates what Call1 might do to Call2ArgLoc, and we
>> use
>> // above ArgMask to update dependence info.
>> - ModRefInfo ModRefC1 = getModRefInfo(Call1, Call2ArgLoc);
>> + ModRefInfo ModRefC1 = getModRefInfo(Call1, Call2ArgLoc, AAQI);
>> ArgMask = intersectModRef(ArgMask, ModRefC1);
>>
>> // Conservatively clear IsMustAlias unless only MustAlias is found.
>> @@ -349,7 +349,7 @@ ModRefInfo AAResults::getModRefInfo(const CallBase
>> *Call1,
>> // might Mod Call1ArgLoc, then we care about either a Mod or a Ref
>> by
>> // Call2. If Call1 might Ref, then we care only about a Mod by
>> Call2.
>> ModRefInfo ArgModRefC1 = getArgModRefInfo(Call1, Call1ArgIdx);
>> - ModRefInfo ModRefC2 = getModRefInfo(Call2, Call1ArgLoc);
>> + ModRefInfo ModRefC2 = getModRefInfo(Call2, Call1ArgLoc, AAQI);
>> if ((isModSet(ArgModRefC1) && isModOrRefSet(ModRefC2)) ||
>> (isRefSet(ArgModRefC1) && isModSet(ModRefC2)))
>> R = intersectModRef(unionModRef(R, ArgModRefC1), Result);
>>
>
> I'll submit a patch for review, but I would really appreciate any comments
> here as well.
>
> Thanks.
>
> Best,
> Haoran
>
> Haoran Xu <haoranxu510 at gmail.com> 于2020年10月22日周四 上午12:12写道:
>
>> Hi David,
>>
>> I just tried creduce, but it generated a code with completely undefined
>> behavior (out of range accesses, etc).
>> The "interesting" criteria I used is "timeout in -O1 but finishes in 20s
>> in -O0". However, it turns out that the undefined behavior in
>> creduce-generated code just makes the criteria "happens to" pass.
>>
>> Best,
>> Haoran
>>
>> Haoran Xu <haoranxu510 at gmail.com> 于2020年10月21日周三 下午10:32写道:
>>
>>> I was just able to determine the offending IR code before and after the
>>> transformation. I'm now almost certain it's a bug in LLVM.
>>>
>>> Before transformation, we have the following IR (I renamed all %xxx for
>>> brevity):
>>>
>>>> %1 = load i8, i8* %0, align 1
>>>> %2 = add i8 %1, -1
>>>> store i8 %2, i8* %0, align 1
>>>>
>>> The above IR is inside a loop, so the value in %0 can be different in
>>> each run.
>>>
>>> The optimization pass changed the IR above to the following:
>>>
>>>> store i8 %3, i8* %0, align 1
>>>>
>>> where %3 is defined by
>>>
>>>> %4 = load i8, i8* %0, align 1
>>>> %3 = add i8 %4, -1
>>>>
>>> in an earlier piece of IR.
>>>
>>> Apparently the pass treated %3 the same thing as %2 and it fired CSE,
>>> without realizing that the content in %0 may have been changed by the loop.
>>>
>>>
>>> David Blaikie <dblaikie at gmail.com> 于2020年10月21日周三 下午10:18写道:
>>>
>>>> Might be worth running the c source file through creduce or similar to
>>>> narrow it down a bit that way too.
>>>>
>>>> On Wed, Oct 21, 2020 at 9:12 PM Haoran Xu via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> A further bisect using opt's -opt-bisect-limit option shows that the
>>>>> following pass is causing the issue:
>>>>>
>>>>>> BISECT: running pass (39) Early CSE w/ MemorySSA on function (main)
>>>>>>
>>>>>
>>>>>
>>>>> Haoran Xu <haoranxu510 at gmail.com> 于2020年10月21日周三 下午9:00写道:
>>>>>
>>>>>> I did a simple bisect on clang version, and it seems like clang 8.0.0
>>>>>> works correctly, but clang 9.0.0 failed to compile the code correctly.
>>>>>> https://godbolt.org/z/676Grr <- if you change the clang version to
>>>>>> 8.0.0, you will see the expected output in 'output' section.
>>>>>> I don't have the ability to bisect on clang git history. I would
>>>>>> greatly appreciate it if any one is willing to do that.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Haoran Xu <haoranxu510 at gmail.com> 于2020年10月21日周三 下午8:47写道:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I'm really amazed to find out that under -O3, a simple piece of C
>>>>>>> code generated from a brainfxxk-to-C transpiler is miscompiled.
>>>>>>> As one probably know, the C code transpiled from brainfxxk only
>>>>>>> contains 3 kind of statements:
>>>>>>>
>>>>>>>> (1) ++(*ptr) / --(*ptr)
>>>>>>>> (2) ++ptr / --ptr
>>>>>>>> (3) while (*ptr) { ... }
>>>>>>>>
>>>>>>> where ptr is a uint8_t*.
>>>>>>> So it seems very clear to me that the code contains no undefined
>>>>>>> behavior (the pointer is uint8_t* and unsigned integer overflow is not UD).
>>>>>>>
>>>>>>> After further investigation, it seems like clang compiled this loop:
>>>>>>>
>>>>>>>> while (*ptr) {
>>>>>>>> --(*ptr);
>>>>>>>> ++ptr;
>>>>>>>> ++(*ptr);
>>>>>>>> --ptr;
>>>>>>>> }
>>>>>>>>
>>>>>>> to an unconditional infinite loop under -O3, resulting in the bug.
>>>>>>> The code snippet above seems completely benign to me.
>>>>>>>
>>>>>>> I attached the offending program. With
>>>>>>>
>>>>>>>> clang a.c -O0
>>>>>>>>
>>>>>>> it worked fine (it should print out an ASCII-art picture of
>>>>>>> mandelbrot fracture). However, with -O1 or -O3, it goes into a dead loop
>>>>>>> (in the code snippet above) after printing out a few characters.
>>>>>>>
>>>>>>> I also tried UndefinedBehaviorSanitizer. Strangely, when compiling
>>>>>>> using
>>>>>>>
>>>>>>>> clang a.c -O3 -fsanitize=undefined
>>>>>>>>
>>>>>>> the code worked again, with no infinite loop, and no undefined
>>>>>>> behavior reported.
>>>>>>>
>>>>>>> So it seems to me a LLVM optimizer bug. I would greatly appreciate
>>>>>>> if any one is willing to investigate.
>>>>>>>
>>>>>>> Best,
>>>>>>> Haoran
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201022/4c5e78a7/attachment.html>
More information about the llvm-dev
mailing list