[llvm-dev] clang10 mis-compiles simple C program transpiled from brainfxxk

Fri Oct 23 13:25:06 PDT 2020

Hi Alina,

Thanks for your reply! That makes sense to me.
Regarding the other point I mentioned (the missing AAQI parameters in some
of the function calls), do you think they make sense, or is it intentional
that you use the non-batchAA version there?
My current understanding is that changing them to use BatchAA might expose
more issues that triggers the `isValueEqualInPotentialCycles` bug, but if
you have the bug fixed, make them use BatchAA as well could probably
improve performance a bit.

Best,
Haoran

Alina Sbirlea <asbirlea at google.com> 于2020年10月23日周五 上午11:21写道：

> Some clarifications to the description in https://reviews.llvm.org/D89991:
> The clearing of the cache is not supposed to be done in BatchAA. The patch
> introducing BatchAA was not a refactoring patch, and it did not
> accidentally miss the lines to clear those caches.
>
> Nikita flagged the issue and I looked into it. The condition that's broken
> in BatchAA is the `isValueEqualInPotentialCycles`. This condition can be
> satisfied in a query and a result cached, but the condition broken in a
> subsequent query. Working out a fix is not straightforward.
> Yes, clearing the whole cache is a big hammer we can use, and, yes, it
> will lose all the performance benefits.
>
> For visibility, I uploaded a WIP fix:
> https://reviews.llvm.org/D90062
> There are still problems I have yet to solve in the above patch, hence the
> lack of reviewers. However, please feel free to test it and let me know if
> it resolves your failure.
>
> Best,
> Alina
>
>
> On Thu, Oct 22, 2020 at 6:47 AM Haoran Xu <haoranxu510 at gmail.com> wrote:
>
>> I don't know anything about LLVM internals, but solely from the commit
>> message of your diff I think it's very likely.
>> So my current understanding is that the ``shrink_and_clear()`` is
>> required to produce correct aliasing result, but if we naively do it for
>> AAQI (as my patch proposed), we probably lose most of the performance
>> benefits of BatchedAA.
>>
>> However, this is only my understanding after spending 2 hours reading the
>> code from zero background knowledge, so I guess I can't say much about it.
>>
>>
>> Nikita Popov <nikita.ppv at gmail.com> 于2020年10月22日周四 上午6:41写道：
>>
>>> I suspect that you found the same issue as
>>> https://github.com/llvm/llvm-project/commit/ddd0f083184feb489684f87cf28d5eb10e0a244e.
>>> We're trying to find a solution to that, but it's not entirely simple.
>>>
>>> Nikita
>>>
>>> On Thu, Oct 22, 2020 at 3:33 PM Haoran Xu via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> I managed to bisected out the diff introducing the bug:
>>>> https://github.com/llvm/llvm-project/commit/bfc779e491099213d74c057b5727bde976c7ba02
>>>>
>>>> And I managed to figured out a fix: it seems (to me) that the diff is
>>>> mostly a code refactoring diff, but the author accidentally removed two
>>>> lines of functional logic, resulting in the bug.
>>>> After adding the two lines back, the miscompiled code works fine now.
>>>> However, I'm not certain of the performance implications of clearing the
>>>> whole AAQI.
>>>>
>>>>> diff --git a/llvm/lib/Analysis/BasicAliasAnalysis.cpp
>>>>> b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
>>>>> index 232132958f7..9a8ca46093d 100644
>>>>> --- a/llvm/lib/Analysis/BasicAliasAnalysis.cpp
>>>>> +++ b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
>>>>> @@ -836,6 +836,8 @@ AliasResult BasicAAResult::alias(const
>>>>> MemoryLocation &LocA,
>>>>>    AliasResult Alias = aliasCheck(LocA.Ptr, LocA.Size, LocA.AATags,
>>>>> LocB.Ptr,
>>>>>                                   LocB.Size, LocB.AATags, AAQI);
>>>>>
>>>>> +  AAQI.AliasCache.shrink_and_clear();
>>>>> +  AAQI.IsCapturedCache.shrink_and_clear();
>>>>>    VisitedPhiBBs.clear();
>>>>>    return Alias;
>>>>>  }
>>>>>
>>>>
>>>> I also noticed that the author forgot to change a few function calls to
>>>> use his new API, but I'm not sure if it is a correctness issue or not. At
>>>> least it's not causing the brainfxxk code miscompilation.
>>>>
>>>>> diff --git a/llvm/lib/Analysis/AliasAnalysis.cpp
>>>>> b/llvm/lib/Analysis/AliasAnalysis.cpp
>>>>> index 06a33f659c1..94a9d12eb17 100644
>>>>> --- a/llvm/lib/Analysis/AliasAnalysis.cpp
>>>>> +++ b/llvm/lib/Analysis/AliasAnalysis.cpp
>>>>> @@ -231,7 +231,7 @@ ModRefInfo AAResults::getModRefInfo(const CallBase
>>>>> *Call,
>>>>>
>>>>>    // If Loc is a constant memory location, the call definitely could
>>>>> not
>>>>>    // modify the memory location.
>>>>> -  if (isModSet(Result) && pointsToConstantMemory(Loc, /*OrLocal*/
>>>>> false))
>>>>> +  if (isModSet(Result) && pointsToConstantMemory(Loc, AAQI,
>>>>> /*OrLocal*/ false))
>>>>>      Result = clearMod(Result);
>>>>>
>>>>>    return Result;
>>>>> @@ -308,7 +308,7 @@ ModRefInfo AAResults::getModRefInfo(const CallBase
>>>>> *Call1,
>>>>>
>>>>>        // ModRefC1 indicates what Call1 might do to Call2ArgLoc, and
>>>>> we use
>>>>>        // above ArgMask to update dependence info.
>>>>> -      ModRefInfo ModRefC1 = getModRefInfo(Call1, Call2ArgLoc);
>>>>> +      ModRefInfo ModRefC1 = getModRefInfo(Call1, Call2ArgLoc, AAQI);
>>>>>        ArgMask = intersectModRef(ArgMask, ModRefC1);
>>>>>
>>>>>        // Conservatively clear IsMustAlias unless only MustAlias is
>>>>> found.
>>>>> @@ -349,7 +349,7 @@ ModRefInfo AAResults::getModRefInfo(const CallBase
>>>>> *Call1,
>>>>>        // might Mod Call1ArgLoc, then we care about either a Mod or a
>>>>> Ref by
>>>>>        // Call2. If Call1 might Ref, then we care only about a Mod by
>>>>> Call2.
>>>>>        ModRefInfo ArgModRefC1 = getArgModRefInfo(Call1, Call1ArgIdx);
>>>>> -      ModRefInfo ModRefC2 = getModRefInfo(Call2, Call1ArgLoc);
>>>>> +      ModRefInfo ModRefC2 = getModRefInfo(Call2, Call1ArgLoc, AAQI);
>>>>>        if ((isModSet(ArgModRefC1) && isModOrRefSet(ModRefC2)) ||
>>>>>            (isRefSet(ArgModRefC1) && isModSet(ModRefC2)))
>>>>>          R = intersectModRef(unionModRef(R, ArgModRefC1), Result);
>>>>>
>>>>
>>>> I'll submit a patch for review, but I would really appreciate any
>>>> comments here as well.
>>>>
>>>> Thanks.
>>>>
>>>> Best,
>>>> Haoran
>>>>
>>>> Haoran Xu <haoranxu510 at gmail.com> 于2020年10月22日周四 上午12:12写道：
>>>>
>>>>> Hi David,
>>>>>
>>>>> I just tried creduce, but it generated a code with completely
>>>>> undefined behavior (out of range accesses, etc).
>>>>> The "interesting" criteria I used is "timeout in -O1 but finishes in
>>>>> 20s in -O0". However, it turns out that the undefined behavior in
>>>>> creduce-generated code just makes the criteria "happens to" pass.
>>>>>
>>>>> Best,
>>>>> Haoran
>>>>>
>>>>> Haoran Xu <haoranxu510 at gmail.com> 于2020年10月21日周三 下午10:32写道：
>>>>>
>>>>>> I was just able to determine the offending IR code before and after
>>>>>> the transformation. I'm now almost certain it's a bug in LLVM.
>>>>>>
>>>>>> Before transformation, we have the following IR (I renamed all %xxx
>>>>>> for brevity):
>>>>>>
>>>>>>> %1 = load i8, i8* %0, align 1
>>>>>>> %2 = add i8 %1, -1
>>>>>>> store i8 %2, i8* %0, align 1
>>>>>>>
>>>>>> The above IR is inside a loop, so the value in %0 can be different in
>>>>>> each run.
>>>>>>
>>>>>> The optimization pass changed the IR above to the following:
>>>>>>
>>>>>>> store i8 %3, i8* %0, align 1
>>>>>>>
>>>>>> where %3 is defined by
>>>>>>
>>>>>>> %4 = load i8, i8* %0, align 1
>>>>>>> %3 = add i8 %4, -1
>>>>>>>
>>>>>> in an earlier piece of IR.
>>>>>>
>>>>>> Apparently the pass treated %3 the same thing as %2 and it fired CSE,
>>>>>> without realizing that the content in %0 may have been changed by the loop.
>>>>>>
>>>>>>
>>>>>> David Blaikie <dblaikie at gmail.com> 于2020年10月21日周三 下午10:18写道：
>>>>>>
>>>>>>> Might be worth running the c source file through creduce or similar
>>>>>>> to narrow it down a bit that way too.
>>>>>>>
>>>>>>> On Wed, Oct 21, 2020 at 9:12 PM Haoran Xu via llvm-dev <
>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>
>>>>>>>> A further bisect using opt's -opt-bisect-limit option shows that
>>>>>>>> the following pass is causing the issue:
>>>>>>>>
>>>>>>>>> BISECT: running pass (39) Early CSE w/ MemorySSA on function (main)
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Haoran Xu <haoranxu510 at gmail.com> 于2020年10月21日周三 下午9:00写道：
>>>>>>>>
>>>>>>>>> I did a simple bisect on clang version, and it seems like clang
>>>>>>>>> 8.0.0 works correctly, but clang 9.0.0 failed to compile the code correctly.
>>>>>>>>> https://godbolt.org/z/676Grr  <- if you change the clang version
>>>>>>>>> to 8.0.0, you will see the expected output in 'output' section.
>>>>>>>>> I don't have the ability to bisect on clang git history. I would
>>>>>>>>> greatly appreciate it if any one is willing to do that.
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> Haoran Xu <haoranxu510 at gmail.com> 于2020年10月21日周三 下午8:47写道：
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I'm really amazed to find out that under -O3, a simple piece of C
>>>>>>>>>> code generated from a brainfxxk-to-C transpiler is miscompiled.
>>>>>>>>>> As one probably know, the C code transpiled from brainfxxk only
>>>>>>>>>> contains 3 kind of statements:
>>>>>>>>>>
>>>>>>>>>>> (1) ++(*ptr) / --(*ptr)
>>>>>>>>>>> (2) ++ptr / --ptr
>>>>>>>>>>> (3) while (*ptr) { ... }
>>>>>>>>>>>
>>>>>>>>>> where ptr is a uint8_t*.
>>>>>>>>>> So it seems very clear to me that the code contains no undefined
>>>>>>>>>> behavior (the pointer is uint8_t* and unsigned integer overflow is not UD).
>>>>>>>>>>
>>>>>>>>>> After further investigation, it seems like clang compiled this
>>>>>>>>>> loop:
>>>>>>>>>>
>>>>>>>>>>> while (*ptr) {
>>>>>>>>>>>  --(*ptr);
>>>>>>>>>>>  ++ptr;
>>>>>>>>>>>  ++(*ptr);
>>>>>>>>>>>  --ptr;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>  to an unconditional infinite loop under -O3, resulting in the
>>>>>>>>>> bug. The code snippet above seems completely benign to me.
>>>>>>>>>>
>>>>>>>>>> I attached the offending program. With
>>>>>>>>>>
>>>>>>>>>>> clang a.c -O0
>>>>>>>>>>>
>>>>>>>>>> it worked fine (it should print out an ASCII-art picture of
>>>>>>>>>> mandelbrot fracture). However, with -O1 or -O3, it goes into a dead loop
>>>>>>>>>> (in the code snippet above) after printing out a few characters.
>>>>>>>>>>
>>>>>>>>>> I also tried UndefinedBehaviorSanitizer. Strangely, when
>>>>>>>>>> compiling using
>>>>>>>>>>
>>>>>>>>>>> clang a.c -O3  -fsanitize=undefined
>>>>>>>>>>>
>>>>>>>>>> the code worked again, with no infinite loop, and no undefined
>>>>>>>>>> behavior reported.
>>>>>>>>>>
>>>>>>>>>> So it seems to me a LLVM optimizer bug. I would greatly
>>>>>>>>>> appreciate if any one is willing to investigate.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Haoran
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>> LLVM Developers mailing list
>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>
>>>>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201023/35ea1ac5/attachment.html>