[llvm-dev] Aliasing rules difference between GCC and Clang

Ivan Kosarev via llvm-dev llvm-dev at lists.llvm.org
Fri Jan 18 01:15:19 PST 2019


Hi, Jonas,

On 18/01/2019 09:52, Jonas Paulsson wrote:
> Hi Ivan,
>
> On 2019-01-17 18:09, Ivan Kosarev wrote:
>> Hello, Jonas,
>>
>> > It seems that when the struct member is an array, the base type
>> > becomes the element type. This is simply unhandled still in
>> > CodeGenTBAA ("TODO"). My question then is if it would be
>> > possible to instead create the DAG nodes !"A" and !"B" (as in
>> > the first type DAG above) and use them as the base types and
>> > then have the access type to be !"int", with the right offset
>> > into the struct node?
>>
>> Correct, the current default TBAA info is not capable of representing 
>> accesses to (ranges of) array elements, hence the decay to 
>> element-type access.
>>
>> Still, for elements indexed with constants. such as a->i[0] in your 
>> example, it is indeed possible to generate proper TBAA tags. Such 
>> tags would be no different from tags generated for accesses to 
>> non-array fields that reside at the same offset within the same base 
>> type. If there are real life cases where adding the support for 
>> constant-indexed accesses would make a difference, then I think that 
>> would be an improvement.
>>
>> > struct S { int i[2]; long l[2]; };
>> > ...
>> > An access like s->i[0], would then get a node like
>> > and this would give a NoAlias above, right? Did I understand
>> > this correctly, or did I miss anything relevant?
>>
>> Yes, TBAA assumes that accesses to int's and long's do not alias.
>>
> What I was actually hoping for was that by keeping the base type ('S') 
> in the type DAG instead of just 'int', two accesses into two different 
> access (struct) types both accessing an 'int' element would be 
> 'NoAlias' by TBAA.

It should result in NoAlias, provided the access types or offsets are 
different, yes.

>
> I don't think handling just constant indexes would help (me). I was 
> hoping that perhaps we could keep the current TBAA algorithm and just 
> extend it by also adding the struct base type in the type DAG. So 
> instead of letting an array (member) access become just 'int', it 
> could become 'A' -> 'int', which would not alias with 'B' -> 'int'.

IIRC, there were proposals/attempts to represent accesses to array 
elements as accesses to their first elements, which can technically be 
encoded with the current TBAA format and thus may work as an incremental 
improvement on top of the existing TBAA machinery you are looking for. 
But this may need making sure there will be no regressions for some 
tricky cases like those that involve GCC's type punning and changing 
effective types within unions and in dynamic memory.

>
> Would this be a possible incremental improvement while working towards 
> the new TBAA in your opinion? To me, it seems like a first good step 
> to just handle different struct types, which does not require any size 
> awareness. I am not sure however how simple this would be, and I would 
> not want it to get in your way with the new TBAA.
>
> /Jonas
>

Regards,


>
>> > Would this improvement require the NewStructPathTBAA flag to be
>> > on, and if so, what is the outlook on this?
>>
>> It depends on whether handling of constant-indexed accesses would be 
>> enough for your use cases.
>>
>> One of the goals behind the new TBAA is to be able to accurately 
>> represent various kinds accesses to subobjects. This includes 
>> non-constant-indexed accesses to array elements. More info on how 
>> such accesses are supposed to be represented can be found here:
>>
>> https://reviews.llvm.org/D40975
>>
>> If that's what you need, then the new TBAA info seems to be the most 
>> promising option. But please note that it's still a work in progress 
>> and is not mature enough for production use.
>>
>> Hope this helps.
>>
>> Regards,
>>
>>
>> On 14/01/2019 15:57, Jonas Paulsson wrote:
>>> CAUTION: This email originated from outside of the organization. Do 
>>> not click links or open attachments unless you recognize the sender 
>>> and know the content is safe.  If you suspect potential phishing or 
>>> spam email, report it to ReportSpam at accesssoftek.com
>>>
>>> Hi,
>>>
>>> It was a while now since I first asked about TBAA, and I am now getting
>>> back to this, trying to understand TBAA as it seems that the issue I
>>> described earlier originally in this thread is still one of many 
>>> "TODO"s
>>> in CGExpr.cpp
>>> (http://lists.llvm.org/pipermail/llvm-dev/2018-September/126290.html).
>>>
>>> I would like to help on this to the best of my abilities, but I'm not
>>> quite sure if there is a current plan of incremental changes, or if it
>>> would be possible to handle the array member of a struct case
>>> specifically at this point?
>>>
>>> I am tempted to believe that the extension to handle my particular case
>>> is not too far-fetched, as Eli indicated earlier:
>>>
>>> This program:
>>>
>>> struct A { int i; };
>>> struct B { int i; };
>>>
>>> void h(struct A *a, struct B *b, int *prim_i) {
>>>   a->i = 0;
>>>   b->i = 0;
>>>   *prim_i = 0;
>>> }
>>>
>>> translates to
>>>
>>> %i = getelementptr inbounds %struct.A, %struct.A* %a, i32 0, i32 0
>>> store i32 0, i32* %i, align 4, !tbaa !2
>>> %i1 = getelementptr inbounds %struct.B, %struct.B* %b, i32 0, i32 0
>>> store i32 0, i32* %i1, align 4, !tbaa !7
>>> store i32 0, i32* %prim_i, align 4, !tbaa !9
>>>
>>> with this type DAG:
>>>
>>>             !2 = !{!3, !4, i64 0}
>>>          !3 = !{!"A", !4, i64 0}
>>>       !4 = !{!"int", !5, i64 0}
>>>    !5 = !{!"omnipotent char", !6, i64 0}
>>> !6 = !{!"Simple C/C++ TBAA"}
>>>             !7 = !{!8, !4, i64 0}
>>>          !8 = !{!"B", !4, i64 0}
>>>          !9 = !{!4, !4, i64 0}
>>>
>>> Just as one would hope for, TBAA produces a NoAlias result between the
>>> stores to %A and %B:
>>>
>>> NoAlias:    store i32 0, i32* %i1, align 4, !tbaa !7 <-> store i32 0,
>>> i32* %i, align 4, !tbaa !2
>>> MayAlias:   store i32 0, i32* %prim_i, align 4, !tbaa !9 <->   store 
>>> i32
>>> 0, i32* %i, align 4, !tbaa !2
>>> MayAlias:   store i32 0, i32* %prim_i, align 4, !tbaa !9 <->   store 
>>> i32
>>> 0, i32* %i1, align 4, !tbaa !7
>>>
>>> The above indicates that TBAA is capable of handling structs with 
>>> scalar
>>> members, but if I change the program to look more like the 
>>> benchmark, like:
>>>
>>> struct A { int i[2]; };
>>> struct B { int i[2]; };
>>>
>>> void h(struct A *a, struct B *b, int *prim_i) {
>>>   a->i[0] = 0;
>>>   b->i[0] = 0;
>>>   *prim_i = 0;
>>> }
>>>
>>> , which translates to
>>>
>>> %i = getelementptr inbounds %struct.A, %struct.A* %a, i32 0, i32 0
>>> %arrayidx = getelementptr inbounds [2 x i32], [2 x i32]* %i, i64 0, 
>>> i64 0
>>> store i32 0, i32* %arrayidx, align 4, !tbaa !2
>>> %i1 = getelementptr inbounds %struct.B, %struct.B* %b, i32 0, i32 0
>>> %arrayidx2 = getelementptr inbounds [2 x i32], [2 x i32]* %i1, i64 
>>> 0, i64 0
>>> store i32 0, i32* %arrayidx2, align 4, !tbaa !2
>>> store i32 0, i32* %prim_i, align 4, !tbaa !2
>>>
>>> , the type DAG is generated as
>>>
>>>          !2 = !{!3, !3, i64 0}
>>>       !3 = !{!"int", !4, i64 0}
>>>    !4 = !{!"omnipotent char", !5, i64 0}
>>> !5 = !{!"Simple C/C++ TBAA"}
>>>
>>> This makes all the stores alias, which is overly conservative, as
>>> %struct.A and %struct.B do not alias.
>>>
>>> It seems that when the struct member is an array, the base type becomes
>>> the element type. This is simply unhandled still in CodeGenTBAA
>>> ("TODO"). My question then is if it would be possible to instead create
>>> the DAG nodes !"A" and !"B" (as in the first type DAG above) and use
>>> them as the base types and then have the access type to be !"int", with
>>> the right offset into the struct node?
>>>
>>> A struct like
>>>
>>> struct S { int i[2]; long l[2]; };
>>>
>>> would then be represented with a base node, something like
>>>
>>>          !3 = !{!"S", !4, i64 0, !6, 8}
>>>       !4 = !{!"int", !5, i64 0}
>>>    !5 = !{!"omnipotent char", !6, i64 0}
>>>       !6 = !{!"long", !5, i64 0}
>>>
>>> , giving !"int" as base at offset 0 and !"long" as base at offset 8
>>> (given that i[2] is 8 bytes).
>>>
>>> An access like s->i[0], would then get a node like
>>>
>>> !2 = !{!3, %4, i64 0}
>>>
>>> , and this would give a NoAlias above, right? Did I understand this
>>> correctly, or did I miss anything relevant?
>>>
>>> What method(s) would need to be modified to handle this? Is it feasible
>>> to do this now, or are there other planned steps that somebody is
>>> working on?
>>>
>>> Would this improvement require the NewStructPathTBAA flag to be on, and
>>> if so, what is the outlook on this?
>>>
>>> @Ivan: For what I can see, you had a list of planned improvements. What
>>> are your thoughts on this?
>>>
>>> Thanks for any help,
>>>
>>> Jonas
>>>
>>
>



More information about the llvm-dev mailing list