[llvm-dev] clang10 mis-compiles simple C program transpiled from brainfxxk

Fri Oct 23 17:20:53 PDT 2020

That's interesting. I didn't try trunk since I thought LLVM 11 is already
the latest. Thanks again!

Alina Sbirlea <asbirlea at google.com> 于2020年10月23日周五 下午5:15写道：

> Hi Haoran,
>
> I think you may have come across an issue that was fixed already. I cannot
> reproduce the failure at ToT.
> In godbolt it reproduces with the clang11 release, but not with trunk.
>
> Best,
> Alina
>
>
> On Fri, Oct 23, 2020 at 4:25 PM Haoran Xu <haoranxu510 at gmail.com> wrote:
>
>> Hi Alina,
>>
>> Thanks for your reply. I don't have commit access (it's my first patch to
>> LLVM), so please do it on my behalf.
>> It's kind of hard for me to compile LLVM on my local machine, so can you
>> just check that the failing code I attached in original email works (i.e.
>> does not dead-loop and prints out a fracture-like ASCII art) in the
>> original email after Nikita's diff?
>>
>> Thanks!
>> Haoran
>>
>>
>> Alina Sbirlea <asbirlea at google.com> 于2020年10月23日周五 下午3:48写道：
>>
>>> Hi Haoran,
>>>
>>> Yes, you're right, the instances you added in the patch can also pass
>>> along the AAQI param.
>>> Do you have commit access to check in this change or should I do it on
>>> your behalf?
>>>
>>> The fix Nikita proposed (https://reviews.llvm.org/D90066) should
>>> resolve the original issue in AA and BatchAA, but please let us know if
>>> you're still seeing it with this patch.
>>>
>>> Best,
>>> Alina
>>>
>>> On Fri, Oct 23, 2020 at 1:25 PM Haoran Xu <haoranxu510 at gmail.com> wrote:
>>>
>>>> Hi Alina,
>>>>
>>>> Thanks for your reply! That makes sense to me.
>>>> Regarding the other point I mentioned (the missing AAQI parameters in
>>>> some of the function calls), do you think they make sense, or is it
>>>> intentional that you use the non-batchAA version there?
>>>> My current understanding is that changing them to use BatchAA might
>>>> expose more issues that triggers the `isValueEqualInPotentialCycles` bug,
>>>> but if you have the bug fixed, make them use BatchAA as well could probably
>>>> improve performance a bit.
>>>>
>>>> Best,
>>>> Haoran
>>>>
>>>> Alina Sbirlea <asbirlea at google.com> 于2020年10月23日周五 上午11:21写道：
>>>>
>>>>> Some clarifications to the description in
>>>>> https://reviews.llvm.org/D89991:
>>>>> The clearing of the cache is not supposed to be done in BatchAA. The
>>>>> patch introducing BatchAA was not a refactoring patch, and it did not
>>>>> accidentally miss the lines to clear those caches.
>>>>>
>>>>> Nikita flagged the issue and I looked into it. The condition that's
>>>>> broken in BatchAA is the `isValueEqualInPotentialCycles`. This condition
>>>>> can be satisfied in a query and a result cached, but the condition broken
>>>>> in a subsequent query. Working out a fix is not straightforward.
>>>>> Yes, clearing the whole cache is a big hammer we can use, and, yes, it
>>>>> will lose all the performance benefits.
>>>>>
>>>>> For visibility, I uploaded a WIP fix:
>>>>> https://reviews.llvm.org/D90062
>>>>> There are still problems I have yet to solve in the above patch, hence
>>>>> the lack of reviewers. However, please feel free to test it and let me know
>>>>> if it resolves your failure.
>>>>>
>>>>> Best,
>>>>> Alina
>>>>>
>>>>>
>>>>> On Thu, Oct 22, 2020 at 6:47 AM Haoran Xu <haoranxu510 at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I don't know anything about LLVM internals, but solely from the
>>>>>> commit message of your diff I think it's very likely.
>>>>>> So my current understanding is that the ``shrink_and_clear()`` is
>>>>>> required to produce correct aliasing result, but if we naively do it for
>>>>>> AAQI (as my patch proposed), we probably lose most of the performance
>>>>>> benefits of BatchedAA.
>>>>>>
>>>>>> However, this is only my understanding after spending 2 hours reading
>>>>>> the code from zero background knowledge, so I guess I can't say much about
>>>>>> it.
>>>>>>
>>>>>>
>>>>>> Nikita Popov <nikita.ppv at gmail.com> 于2020年10月22日周四 上午6:41写道：
>>>>>>
>>>>>>> I suspect that you found the same issue as
>>>>>>> https://github.com/llvm/llvm-project/commit/ddd0f083184feb489684f87cf28d5eb10e0a244e.
>>>>>>> We're trying to find a solution to that, but it's not entirely simple.
>>>>>>>
>>>>>>> Nikita
>>>>>>>
>>>>>>> On Thu, Oct 22, 2020 at 3:33 PM Haoran Xu via llvm-dev <
>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>
>>>>>>>> I managed to bisected out the diff introducing the bug:
>>>>>>>> https://github.com/llvm/llvm-project/commit/bfc779e491099213d74c057b5727bde976c7ba02
>>>>>>>>
>>>>>>>> And I managed to figured out a fix: it seems (to me) that the diff
>>>>>>>> is mostly a code refactoring diff, but the author accidentally removed two
>>>>>>>> lines of functional logic, resulting in the bug.
>>>>>>>> After adding the two lines back, the miscompiled code works fine
>>>>>>>> now. However, I'm not certain of the performance implications of clearing
>>>>>>>> the whole AAQI.
>>>>>>>>
>>>>>>>>> diff --git a/llvm/lib/Analysis/BasicAliasAnalysis.cpp
>>>>>>>>> b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
>>>>>>>>> index 232132958f7..9a8ca46093d 100644
>>>>>>>>> --- a/llvm/lib/Analysis/BasicAliasAnalysis.cpp
>>>>>>>>> +++ b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
>>>>>>>>> @@ -836,6 +836,8 @@ AliasResult BasicAAResult::alias(const
>>>>>>>>> MemoryLocation &LocA,
>>>>>>>>>    AliasResult Alias = aliasCheck(LocA.Ptr, LocA.Size,
>>>>>>>>> LocA.AATags, LocB.Ptr,
>>>>>>>>>                                   LocB.Size, LocB.AATags, AAQI);
>>>>>>>>>
>>>>>>>>> +  AAQI.AliasCache.shrink_and_clear();
>>>>>>>>> +  AAQI.IsCapturedCache.shrink_and_clear();
>>>>>>>>>    VisitedPhiBBs.clear();
>>>>>>>>>    return Alias;
>>>>>>>>>  }
>>>>>>>>>
>>>>>>>>
>>>>>>>> I also noticed that the author forgot to change a few function
>>>>>>>> calls to use his new API, but I'm not sure if it is a correctness issue or
>>>>>>>> not. At least it's not causing the brainfxxk code miscompilation.
>>>>>>>>
>>>>>>>>> diff --git a/llvm/lib/Analysis/AliasAnalysis.cpp
>>>>>>>>> b/llvm/lib/Analysis/AliasAnalysis.cpp
>>>>>>>>> index 06a33f659c1..94a9d12eb17 100644
>>>>>>>>> --- a/llvm/lib/Analysis/AliasAnalysis.cpp
>>>>>>>>> +++ b/llvm/lib/Analysis/AliasAnalysis.cpp
>>>>>>>>> @@ -231,7 +231,7 @@ ModRefInfo AAResults::getModRefInfo(const
>>>>>>>>> CallBase *Call,
>>>>>>>>>
>>>>>>>>>    // If Loc is a constant memory location, the call definitely
>>>>>>>>> could not
>>>>>>>>>    // modify the memory location.
>>>>>>>>> -  if (isModSet(Result) && pointsToConstantMemory(Loc, /*OrLocal*/
>>>>>>>>> false))
>>>>>>>>> +  if (isModSet(Result) && pointsToConstantMemory(Loc, AAQI,
>>>>>>>>> /*OrLocal*/ false))
>>>>>>>>>      Result = clearMod(Result);
>>>>>>>>>
>>>>>>>>>    return Result;
>>>>>>>>> @@ -308,7 +308,7 @@ ModRefInfo AAResults::getModRefInfo(const
>>>>>>>>> CallBase *Call1,
>>>>>>>>>
>>>>>>>>>        // ModRefC1 indicates what Call1 might do to Call2ArgLoc,
>>>>>>>>> and we use
>>>>>>>>>        // above ArgMask to update dependence info.
>>>>>>>>> -      ModRefInfo ModRefC1 = getModRefInfo(Call1, Call2ArgLoc);
>>>>>>>>> +      ModRefInfo ModRefC1 = getModRefInfo(Call1, Call2ArgLoc,
>>>>>>>>> AAQI);
>>>>>>>>>        ArgMask = intersectModRef(ArgMask, ModRefC1);
>>>>>>>>>
>>>>>>>>>        // Conservatively clear IsMustAlias unless only MustAlias
>>>>>>>>> is found.
>>>>>>>>> @@ -349,7 +349,7 @@ ModRefInfo AAResults::getModRefInfo(const
>>>>>>>>> CallBase *Call1,
>>>>>>>>>        // might Mod Call1ArgLoc, then we care about either a Mod
>>>>>>>>> or a Ref by
>>>>>>>>>        // Call2. If Call1 might Ref, then we care only about a Mod
>>>>>>>>> by Call2.
>>>>>>>>>        ModRefInfo ArgModRefC1 = getArgModRefInfo(Call1,
>>>>>>>>> Call1ArgIdx);
>>>>>>>>> -      ModRefInfo ModRefC2 = getModRefInfo(Call2, Call1ArgLoc);
>>>>>>>>> +      ModRefInfo ModRefC2 = getModRefInfo(Call2, Call1ArgLoc,
>>>>>>>>> AAQI);
>>>>>>>>>        if ((isModSet(ArgModRefC1) && isModOrRefSet(ModRefC2)) ||
>>>>>>>>>            (isRefSet(ArgModRefC1) && isModSet(ModRefC2)))
>>>>>>>>>          R = intersectModRef(unionModRef(R, ArgModRefC1), Result);
>>>>>>>>>
>>>>>>>>
>>>>>>>> I'll submit a patch for review, but I would really appreciate any
>>>>>>>> comments here as well.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Haoran
>>>>>>>>
>>>>>>>> Haoran Xu <haoranxu510 at gmail.com> 于2020年10月22日周四 上午12:12写道：
>>>>>>>>
>>>>>>>>> Hi David,
>>>>>>>>>
>>>>>>>>> I just tried creduce, but it generated a code with completely
>>>>>>>>> undefined behavior (out of range accesses, etc).
>>>>>>>>> The "interesting" criteria I used is "timeout in -O1 but finishes
>>>>>>>>> in 20s in -O0". However, it turns out that the undefined behavior in
>>>>>>>>> creduce-generated code just makes the criteria "happens to" pass.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Haoran
>>>>>>>>>
>>>>>>>>> Haoran Xu <haoranxu510 at gmail.com> 于2020年10月21日周三 下午10:32写道：
>>>>>>>>>
>>>>>>>>>> I was just able to determine the offending IR code before and
>>>>>>>>>> after the transformation. I'm now almost certain it's a bug in LLVM.
>>>>>>>>>>
>>>>>>>>>> Before transformation, we have the following IR (I renamed all
>>>>>>>>>> %xxx for brevity):
>>>>>>>>>>
>>>>>>>>>>> %1 = load i8, i8* %0, align 1
>>>>>>>>>>> %2 = add i8 %1, -1
>>>>>>>>>>> store i8 %2, i8* %0, align 1
>>>>>>>>>>>
>>>>>>>>>> The above IR is inside a loop, so the value in %0 can be
>>>>>>>>>> different in each run.
>>>>>>>>>>
>>>>>>>>>> The optimization pass changed the IR above to the following:
>>>>>>>>>>
>>>>>>>>>>> store i8 %3, i8* %0, align 1
>>>>>>>>>>>
>>>>>>>>>> where %3 is defined by
>>>>>>>>>>
>>>>>>>>>>> %4 = load i8, i8* %0, align 1
>>>>>>>>>>> %3 = add i8 %4, -1
>>>>>>>>>>>
>>>>>>>>>> in an earlier piece of IR.
>>>>>>>>>>
>>>>>>>>>> Apparently the pass treated %3 the same thing as %2 and it fired
>>>>>>>>>> CSE, without realizing that the content in %0 may have been changed by the
>>>>>>>>>> loop.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> David Blaikie <dblaikie at gmail.com> 于2020年10月21日周三 下午10:18写道：
>>>>>>>>>>
>>>>>>>>>>> Might be worth running the c source file through creduce or
>>>>>>>>>>> similar to narrow it down a bit that way too.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 21, 2020 at 9:12 PM Haoran Xu via llvm-dev <
>>>>>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> A further bisect using opt's -opt-bisect-limit option shows
>>>>>>>>>>>> that the following pass is causing the issue:
>>>>>>>>>>>>
>>>>>>>>>>>>> BISECT: running pass (39) Early CSE w/ MemorySSA on function
>>>>>>>>>>>>> (main)
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Haoran Xu <haoranxu510 at gmail.com> 于2020年10月21日周三 下午9:00写道：
>>>>>>>>>>>>
>>>>>>>>>>>>> I did a simple bisect on clang version, and it seems like
>>>>>>>>>>>>> clang 8.0.0 works correctly, but clang 9.0.0 failed to compile the code
>>>>>>>>>>>>> correctly.
>>>>>>>>>>>>> https://godbolt.org/z/676Grr  <- if you change the clang
>>>>>>>>>>>>> version to 8.0.0, you will see the expected output in 'output' section.
>>>>>>>>>>>>> I don't have the ability to bisect on clang git history. I
>>>>>>>>>>>>> would greatly appreciate it if any one is willing to do that.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Haoran Xu <haoranxu510 at gmail.com> 于2020年10月21日周三 下午8:47写道：
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm really amazed to find out that under -O3, a simple piece
>>>>>>>>>>>>>> of C code generated from a brainfxxk-to-C transpiler is miscompiled.
>>>>>>>>>>>>>> As one probably know, the C code transpiled from brainfxxk
>>>>>>>>>>>>>> only contains 3 kind of statements:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> (1) ++(*ptr) / --(*ptr)
>>>>>>>>>>>>>>> (2) ++ptr / --ptr
>>>>>>>>>>>>>>> (3) while (*ptr) { ... }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> where ptr is a uint8_t*.
>>>>>>>>>>>>>> So it seems very clear to me that the code contains no
>>>>>>>>>>>>>> undefined behavior (the pointer is uint8_t* and unsigned integer overflow
>>>>>>>>>>>>>> is not UD).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> After further investigation, it seems like clang compiled
>>>>>>>>>>>>>> this loop:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> while (*ptr) {
>>>>>>>>>>>>>>>  --(*ptr);
>>>>>>>>>>>>>>>  ++ptr;
>>>>>>>>>>>>>>>  ++(*ptr);
>>>>>>>>>>>>>>>  --ptr;
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  to an unconditional infinite loop under -O3, resulting in
>>>>>>>>>>>>>> the bug. The code snippet above seems completely benign to me.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I attached the offending program. With
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> clang a.c -O0
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> it worked fine (it should print out an ASCII-art picture of
>>>>>>>>>>>>>> mandelbrot fracture). However, with -O1 or -O3, it goes into a dead loop
>>>>>>>>>>>>>> (in the code snippet above) after printing out a few characters.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I also tried UndefinedBehaviorSanitizer. Strangely, when
>>>>>>>>>>>>>> compiling using
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> clang a.c -O3  -fsanitize=undefined
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> the code worked again, with no infinite loop, and no
>>>>>>>>>>>>>> undefined behavior reported.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So it seems to me a LLVM optimizer bug. I would greatly
>>>>>>>>>>>>>> appreciate if any one is willing to investigate.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Haoran
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>> LLVM Developers mailing list
>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>
>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201023/176d0e3f/attachment.html>