[llvm-dev] clang10 mis-compiles simple C program transpiled from brainfxxk

Alina Sbirlea via llvm-dev llvm-dev at lists.llvm.org
Fri Oct 23 17:15:35 PDT 2020


Hi Haoran,

I think you may have come across an issue that was fixed already. I cannot
reproduce the failure at ToT.
In godbolt it reproduces with the clang11 release, but not with trunk.

Best,
Alina


On Fri, Oct 23, 2020 at 4:25 PM Haoran Xu <haoranxu510 at gmail.com> wrote:

> Hi Alina,
>
> Thanks for your reply. I don't have commit access (it's my first patch to
> LLVM), so please do it on my behalf.
> It's kind of hard for me to compile LLVM on my local machine, so can you
> just check that the failing code I attached in original email works (i.e.
> does not dead-loop and prints out a fracture-like ASCII art) in the
> original email after Nikita's diff?
>
> Thanks!
> Haoran
>
>
> Alina Sbirlea <asbirlea at google.com> 于2020年10月23日周五 下午3:48写道:
>
>> Hi Haoran,
>>
>> Yes, you're right, the instances you added in the patch can also pass
>> along the AAQI param.
>> Do you have commit access to check in this change or should I do it on
>> your behalf?
>>
>> The fix Nikita proposed (https://reviews.llvm.org/D90066) should resolve
>> the original issue in AA and BatchAA, but please let us know if you're
>> still seeing it with this patch.
>>
>> Best,
>> Alina
>>
>> On Fri, Oct 23, 2020 at 1:25 PM Haoran Xu <haoranxu510 at gmail.com> wrote:
>>
>>> Hi Alina,
>>>
>>> Thanks for your reply! That makes sense to me.
>>> Regarding the other point I mentioned (the missing AAQI parameters in
>>> some of the function calls), do you think they make sense, or is it
>>> intentional that you use the non-batchAA version there?
>>> My current understanding is that changing them to use BatchAA might
>>> expose more issues that triggers the `isValueEqualInPotentialCycles` bug,
>>> but if you have the bug fixed, make them use BatchAA as well could probably
>>> improve performance a bit.
>>>
>>> Best,
>>> Haoran
>>>
>>> Alina Sbirlea <asbirlea at google.com> 于2020年10月23日周五 上午11:21写道:
>>>
>>>> Some clarifications to the description in
>>>> https://reviews.llvm.org/D89991:
>>>> The clearing of the cache is not supposed to be done in BatchAA. The
>>>> patch introducing BatchAA was not a refactoring patch, and it did not
>>>> accidentally miss the lines to clear those caches.
>>>>
>>>> Nikita flagged the issue and I looked into it. The condition that's
>>>> broken in BatchAA is the `isValueEqualInPotentialCycles`. This condition
>>>> can be satisfied in a query and a result cached, but the condition broken
>>>> in a subsequent query. Working out a fix is not straightforward.
>>>> Yes, clearing the whole cache is a big hammer we can use, and, yes, it
>>>> will lose all the performance benefits.
>>>>
>>>> For visibility, I uploaded a WIP fix:
>>>> https://reviews.llvm.org/D90062
>>>> There are still problems I have yet to solve in the above patch, hence
>>>> the lack of reviewers. However, please feel free to test it and let me know
>>>> if it resolves your failure.
>>>>
>>>> Best,
>>>> Alina
>>>>
>>>>
>>>> On Thu, Oct 22, 2020 at 6:47 AM Haoran Xu <haoranxu510 at gmail.com>
>>>> wrote:
>>>>
>>>>> I don't know anything about LLVM internals, but solely from the commit
>>>>> message of your diff I think it's very likely.
>>>>> So my current understanding is that the ``shrink_and_clear()`` is
>>>>> required to produce correct aliasing result, but if we naively do it for
>>>>> AAQI (as my patch proposed), we probably lose most of the performance
>>>>> benefits of BatchedAA.
>>>>>
>>>>> However, this is only my understanding after spending 2 hours reading
>>>>> the code from zero background knowledge, so I guess I can't say much about
>>>>> it.
>>>>>
>>>>>
>>>>> Nikita Popov <nikita.ppv at gmail.com> 于2020年10月22日周四 上午6:41写道:
>>>>>
>>>>>> I suspect that you found the same issue as
>>>>>> https://github.com/llvm/llvm-project/commit/ddd0f083184feb489684f87cf28d5eb10e0a244e.
>>>>>> We're trying to find a solution to that, but it's not entirely simple.
>>>>>>
>>>>>> Nikita
>>>>>>
>>>>>> On Thu, Oct 22, 2020 at 3:33 PM Haoran Xu via llvm-dev <
>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>
>>>>>>> I managed to bisected out the diff introducing the bug:
>>>>>>> https://github.com/llvm/llvm-project/commit/bfc779e491099213d74c057b5727bde976c7ba02
>>>>>>>
>>>>>>> And I managed to figured out a fix: it seems (to me) that the diff
>>>>>>> is mostly a code refactoring diff, but the author accidentally removed two
>>>>>>> lines of functional logic, resulting in the bug.
>>>>>>> After adding the two lines back, the miscompiled code works fine
>>>>>>> now. However, I'm not certain of the performance implications of clearing
>>>>>>> the whole AAQI.
>>>>>>>
>>>>>>>> diff --git a/llvm/lib/Analysis/BasicAliasAnalysis.cpp
>>>>>>>> b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
>>>>>>>> index 232132958f7..9a8ca46093d 100644
>>>>>>>> --- a/llvm/lib/Analysis/BasicAliasAnalysis.cpp
>>>>>>>> +++ b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
>>>>>>>> @@ -836,6 +836,8 @@ AliasResult BasicAAResult::alias(const
>>>>>>>> MemoryLocation &LocA,
>>>>>>>>    AliasResult Alias = aliasCheck(LocA.Ptr, LocA.Size, LocA.AATags,
>>>>>>>> LocB.Ptr,
>>>>>>>>                                   LocB.Size, LocB.AATags, AAQI);
>>>>>>>>
>>>>>>>> +  AAQI.AliasCache.shrink_and_clear();
>>>>>>>> +  AAQI.IsCapturedCache.shrink_and_clear();
>>>>>>>>    VisitedPhiBBs.clear();
>>>>>>>>    return Alias;
>>>>>>>>  }
>>>>>>>>
>>>>>>>
>>>>>>> I also noticed that the author forgot to change a few function calls
>>>>>>> to use his new API, but I'm not sure if it is a correctness issue or not.
>>>>>>> At least it's not causing the brainfxxk code miscompilation.
>>>>>>>
>>>>>>>> diff --git a/llvm/lib/Analysis/AliasAnalysis.cpp
>>>>>>>> b/llvm/lib/Analysis/AliasAnalysis.cpp
>>>>>>>> index 06a33f659c1..94a9d12eb17 100644
>>>>>>>> --- a/llvm/lib/Analysis/AliasAnalysis.cpp
>>>>>>>> +++ b/llvm/lib/Analysis/AliasAnalysis.cpp
>>>>>>>> @@ -231,7 +231,7 @@ ModRefInfo AAResults::getModRefInfo(const
>>>>>>>> CallBase *Call,
>>>>>>>>
>>>>>>>>    // If Loc is a constant memory location, the call definitely
>>>>>>>> could not
>>>>>>>>    // modify the memory location.
>>>>>>>> -  if (isModSet(Result) && pointsToConstantMemory(Loc, /*OrLocal*/
>>>>>>>> false))
>>>>>>>> +  if (isModSet(Result) && pointsToConstantMemory(Loc, AAQI,
>>>>>>>> /*OrLocal*/ false))
>>>>>>>>      Result = clearMod(Result);
>>>>>>>>
>>>>>>>>    return Result;
>>>>>>>> @@ -308,7 +308,7 @@ ModRefInfo AAResults::getModRefInfo(const
>>>>>>>> CallBase *Call1,
>>>>>>>>
>>>>>>>>        // ModRefC1 indicates what Call1 might do to Call2ArgLoc,
>>>>>>>> and we use
>>>>>>>>        // above ArgMask to update dependence info.
>>>>>>>> -      ModRefInfo ModRefC1 = getModRefInfo(Call1, Call2ArgLoc);
>>>>>>>> +      ModRefInfo ModRefC1 = getModRefInfo(Call1, Call2ArgLoc,
>>>>>>>> AAQI);
>>>>>>>>        ArgMask = intersectModRef(ArgMask, ModRefC1);
>>>>>>>>
>>>>>>>>        // Conservatively clear IsMustAlias unless only MustAlias is
>>>>>>>> found.
>>>>>>>> @@ -349,7 +349,7 @@ ModRefInfo AAResults::getModRefInfo(const
>>>>>>>> CallBase *Call1,
>>>>>>>>        // might Mod Call1ArgLoc, then we care about either a Mod or
>>>>>>>> a Ref by
>>>>>>>>        // Call2. If Call1 might Ref, then we care only about a Mod
>>>>>>>> by Call2.
>>>>>>>>        ModRefInfo ArgModRefC1 = getArgModRefInfo(Call1,
>>>>>>>> Call1ArgIdx);
>>>>>>>> -      ModRefInfo ModRefC2 = getModRefInfo(Call2, Call1ArgLoc);
>>>>>>>> +      ModRefInfo ModRefC2 = getModRefInfo(Call2, Call1ArgLoc,
>>>>>>>> AAQI);
>>>>>>>>        if ((isModSet(ArgModRefC1) && isModOrRefSet(ModRefC2)) ||
>>>>>>>>            (isRefSet(ArgModRefC1) && isModSet(ModRefC2)))
>>>>>>>>          R = intersectModRef(unionModRef(R, ArgModRefC1), Result);
>>>>>>>>
>>>>>>>
>>>>>>> I'll submit a patch for review, but I would really appreciate any
>>>>>>> comments here as well.
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> Best,
>>>>>>> Haoran
>>>>>>>
>>>>>>> Haoran Xu <haoranxu510 at gmail.com> 于2020年10月22日周四 上午12:12写道:
>>>>>>>
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> I just tried creduce, but it generated a code with completely
>>>>>>>> undefined behavior (out of range accesses, etc).
>>>>>>>> The "interesting" criteria I used is "timeout in -O1 but finishes
>>>>>>>> in 20s in -O0". However, it turns out that the undefined behavior in
>>>>>>>> creduce-generated code just makes the criteria "happens to" pass.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Haoran
>>>>>>>>
>>>>>>>> Haoran Xu <haoranxu510 at gmail.com> 于2020年10月21日周三 下午10:32写道:
>>>>>>>>
>>>>>>>>> I was just able to determine the offending IR code before and
>>>>>>>>> after the transformation. I'm now almost certain it's a bug in LLVM.
>>>>>>>>>
>>>>>>>>> Before transformation, we have the following IR (I renamed all
>>>>>>>>> %xxx for brevity):
>>>>>>>>>
>>>>>>>>>> %1 = load i8, i8* %0, align 1
>>>>>>>>>> %2 = add i8 %1, -1
>>>>>>>>>> store i8 %2, i8* %0, align 1
>>>>>>>>>>
>>>>>>>>> The above IR is inside a loop, so the value in %0 can be different
>>>>>>>>> in each run.
>>>>>>>>>
>>>>>>>>> The optimization pass changed the IR above to the following:
>>>>>>>>>
>>>>>>>>>> store i8 %3, i8* %0, align 1
>>>>>>>>>>
>>>>>>>>> where %3 is defined by
>>>>>>>>>
>>>>>>>>>> %4 = load i8, i8* %0, align 1
>>>>>>>>>> %3 = add i8 %4, -1
>>>>>>>>>>
>>>>>>>>> in an earlier piece of IR.
>>>>>>>>>
>>>>>>>>> Apparently the pass treated %3 the same thing as %2 and it fired
>>>>>>>>> CSE, without realizing that the content in %0 may have been changed by the
>>>>>>>>> loop.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> David Blaikie <dblaikie at gmail.com> 于2020年10月21日周三 下午10:18写道:
>>>>>>>>>
>>>>>>>>>> Might be worth running the c source file through creduce or
>>>>>>>>>> similar to narrow it down a bit that way too.
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 21, 2020 at 9:12 PM Haoran Xu via llvm-dev <
>>>>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> A further bisect using opt's -opt-bisect-limit option shows
>>>>>>>>>>> that the following pass is causing the issue:
>>>>>>>>>>>
>>>>>>>>>>>> BISECT: running pass (39) Early CSE w/ MemorySSA on function
>>>>>>>>>>>> (main)
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Haoran Xu <haoranxu510 at gmail.com> 于2020年10月21日周三 下午9:00写道:
>>>>>>>>>>>
>>>>>>>>>>>> I did a simple bisect on clang version, and it seems like clang
>>>>>>>>>>>> 8.0.0 works correctly, but clang 9.0.0 failed to compile the code correctly.
>>>>>>>>>>>> https://godbolt.org/z/676Grr  <- if you change the clang
>>>>>>>>>>>> version to 8.0.0, you will see the expected output in 'output' section.
>>>>>>>>>>>> I don't have the ability to bisect on clang git history. I
>>>>>>>>>>>> would greatly appreciate it if any one is willing to do that.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>
>>>>>>>>>>>> Haoran Xu <haoranxu510 at gmail.com> 于2020年10月21日周三 下午8:47写道:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm really amazed to find out that under -O3, a simple piece
>>>>>>>>>>>>> of C code generated from a brainfxxk-to-C transpiler is miscompiled.
>>>>>>>>>>>>> As one probably know, the C code transpiled from brainfxxk
>>>>>>>>>>>>> only contains 3 kind of statements:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> (1) ++(*ptr) / --(*ptr)
>>>>>>>>>>>>>> (2) ++ptr / --ptr
>>>>>>>>>>>>>> (3) while (*ptr) { ... }
>>>>>>>>>>>>>>
>>>>>>>>>>>>> where ptr is a uint8_t*.
>>>>>>>>>>>>> So it seems very clear to me that the code contains no
>>>>>>>>>>>>> undefined behavior (the pointer is uint8_t* and unsigned integer overflow
>>>>>>>>>>>>> is not UD).
>>>>>>>>>>>>>
>>>>>>>>>>>>> After further investigation, it seems like clang compiled this
>>>>>>>>>>>>> loop:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> while (*ptr) {
>>>>>>>>>>>>>>  --(*ptr);
>>>>>>>>>>>>>>  ++ptr;
>>>>>>>>>>>>>>  ++(*ptr);
>>>>>>>>>>>>>>  --ptr;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>  to an unconditional infinite loop under -O3, resulting in the
>>>>>>>>>>>>> bug. The code snippet above seems completely benign to me.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I attached the offending program. With
>>>>>>>>>>>>>
>>>>>>>>>>>>>> clang a.c -O0
>>>>>>>>>>>>>>
>>>>>>>>>>>>> it worked fine (it should print out an ASCII-art picture of
>>>>>>>>>>>>> mandelbrot fracture). However, with -O1 or -O3, it goes into a dead loop
>>>>>>>>>>>>> (in the code snippet above) after printing out a few characters.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I also tried UndefinedBehaviorSanitizer. Strangely, when
>>>>>>>>>>>>> compiling using
>>>>>>>>>>>>>
>>>>>>>>>>>>>> clang a.c -O3  -fsanitize=undefined
>>>>>>>>>>>>>>
>>>>>>>>>>>>> the code worked again, with no infinite loop, and no undefined
>>>>>>>>>>>>> behavior reported.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So it seems to me a LLVM optimizer bug. I would greatly
>>>>>>>>>>>>> appreciate if any one is willing to investigate.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Haoran
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>> LLVM Developers mailing list
>>>>>>> llvm-dev at lists.llvm.org
>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>
>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201023/e6214c16/attachment-0001.html>


More information about the llvm-dev mailing list