[llvm-dev] Field sensitive alias analysis?

Tue Dec 8 07:44:07 PST 2015

Just remember that to LLVM, TBAA info doesn't tell it about types.

So when you say " Differentiating array subscripts also seems like out of
the scope of TBAA but adding type information that this memory assess
accessing array not a random object of given type could be useful."

Remember that LLVM has no notion of C/C++ types.

It doesn't know about them, it doesn't care about them.

What TBAA is telling it is that certain memory locations are disjoint.
It will *never* understand that what is being accessed is "a C++ struct
containing an array of shorts", because the type system doesn't know what a
"C struct" is, or a "C array" or a  "C short".

Thus, while helpful, TBAA info in LLVM tells it *nothing* about the types,
it only tells it that "if i am accessing this offset, it is disjoint with
accessing this other offset".

On Tue, Dec 8, 2015 at 3:38 AM, Dmitry Polukhin <dmitry.polukhin at gmail.com>
wrote:

> Jeroen, thank you for very useful link with the context. Indeed union
> cases are very complicated and I see places in code when TBAA gives up.
>
> Daniel, I completely agree that TBAA has limited power and can solve
> relatively simple cases only. So anything more complicated that involves
> intermediate variables that points to struct or array elements cannot be
> solved by TBAA alone. Differentiating array subscripts also seems like out
> of the scope of TBAA but adding type information that this memory assess
> accessing array not a random object of given type could be useful.
> Therefore TBAA can compliment points-to analysis and sometimes help in
> cases when points-to may not have enough information.
>
> On Mon, Dec 7, 2015 at 7:43 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>
>> TBAA is about types, not accesses.
>> TBAA "struct path data" is about access paths and types that are the end
>> of access paths, for the most part.
>>
>> It has no notion of access size, etc, only offset.
>>
>> It is possible to extend it and handle *very basic* cases in the frontend
>> (IE structs not containing unions anywhere, with constant accesses, etc)
>> But it would degrade quite quickly (you would likely need a sane way to
>> say "this is an access to an unknown offset into this type")
>>
>> Generally, rather than just try to produce metadata in the frontend, most
>> compilers perform field-sensitive points-to or something similar for
>> fields, and then rely on data dependence for differentiating array
>> subscripts.
>>
>> (This is what GCC does, LLVM has CFL-AA, but it's not field sensitive yet)
>>
>> So handling .size vs .a[] is probably possible in the frontend.
>>
>> Doing array subscript analysis in general, probably not something you
>> want in the frontend.
>> Handling tricky cases of what pointers to structs point to, probably the
>> domain of field-sensitive points-to
>>
>> On Mon, Dec 7, 2015 at 7:13 AM, Dmitry Polukhin via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> BTW, I have found why it doesn't work for arrays. TBAA information
>>> propagation is not implemented in CodeGenFunction::EmitArraySubscriptExpr
>>> with "TODO: Preserve/extend path TBAA metadata?".
>>>
>>> On Fri, Dec 4, 2015 at 1:38 PM, Dmitry Polukhin <
>>> dmitry.polukhin at gmail.com> wrote:
>>>
>>>> As far as I can see it is specifics of arrays inside structs. Current
>>>> TBAA does distinguish non-array members with path sensitive TBAA (see !1
>>>> and !6 in my example below, TBAA has reference to struct !2 and offset). As
>>>> for arrays information that it was member of some struct get lost
>>>> completely (!7 has nothing about struct !2).
>>>>
>>>> struct S {
>>>>   int a;
>>>>   int b;
>>>>   int c[3];
>>>> };
>>>>
>>>> void foo(struct S* p) {
>>>>   p->a = 1;
>>>>   p->b = 2;
>>>>   p->c[0] = 3;
>>>>   p->c[1] = 4;
>>>> }
>>>>
>>>> define void @foo(%struct.S* nocapture %p) #0 {
>>>> entry:
>>>>   %a = getelementptr inbounds %struct.S, %struct.S* %p, i64 0, i32 0
>>>>   store i32 1, i32* %a, align 4, !tbaa !1
>>>>   %b = getelementptr inbounds %struct.S, %struct.S* %p, i64 0, i32 1
>>>>   store i32 2, i32* %b, align 4, !tbaa !6
>>>>   %arrayidx = getelementptr inbounds %struct.S, %struct.S* %p, i64 0,
>>>> i32 2, i64 0
>>>>   store i32 3, i32* %arrayidx, align 4, !tbaa !7
>>>>   %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %p, i64 0,
>>>> i32 2, i64 1
>>>>   store i32 4, i32* %arrayidx2, align 4, !tbaa !7
>>>>   ret void
>>>> }
>>>>
>>>> !0 = !{!"clang version 3.8.0 "}
>>>> !1 = !{!2, !3, i64 0}
>>>> !2 = !{!"S", !3, i64 0, !3, i64 4, !4, i64 8}
>>>> !3 = !{!"int", !4, i64 0}
>>>> !4 = !{!"omnipotent char", !5, i64 0}
>>>> !5 = !{!"Simple C/C++ TBAA"}
>>>> !6 = !{!2, !3, i64 4}
>>>> !7 = !{!3, !3, i64 0}
>>>>
>>>> I'm just start learning how TBAA in clang works so I don't know why it
>>>> was implemented this way.
>>>>
>>>> On Fri, Dec 4, 2015 at 11:06 AM, Vaivaswatha Nagaraj via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm trying to optimize a simple C code and came across a situation
>>>>> where invariant code is not being moved out:
>>>>>
>>>>> On an -O3 compilation, I noticed that the "load" for the loop bounds
>>>>> (which remain invariant throughout) happens on each iteration of both the
>>>>> loops, even though it is not modified anywhere in the function "bigLoop".
>>>>> It seems that alias analysis is not able to say that the writes to one
>>>>> field in the structure does not impact the other field, leading to LICM
>>>>> being ineffective.
>>>>>
>>>>> Do any of the alias analyses currently have some kind of field
>>>>> sensitivity that can help in this case?
>>>>>
>>>>> ------------------------- test case
>>>>> ------------------------------------
>>>>> #include <stdlib.h>
>>>>> #include <stdio.h>
>>>>>
>>>>> #define SIZE 100
>>>>>
>>>>> struct AS {
>>>>>   int a[SIZE+4];
>>>>>   int size;
>>>>> } A;
>>>>>
>>>>> void bigLoop(void)
>>>>> {
>>>>>   unsigned i, j;
>>>>>
>>>>>   for (i = 0; i < A.size; i++) {
>>>>>     A.a[i+2] +=  A.a[i];
>>>>>   }
>>>>>   for (i = 0; i < A.size; i++) {
>>>>>     A.a[i+2] *=  A.a[i];
>>>>>   }
>>>>> }
>>>>>
>>>>> int main()
>>>>> {
>>>>>   A.size = random()%SIZE;
>>>>>   for (unsigned i = 0; i < A.size; i++) {
>>>>>     A.a[i] = random()%23;
>>>>>   }
>>>>>   bigLoop();
>>>>>   return 0;
>>>>> }
>>>>>
>>>>> Thanks,
>>>>>
>>>>>   - Vaivaswatha
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151208/18c9dcc6/attachment.html>