[LLVMdev] [PATCH] Vectorizing Global Structures - Take 2

Duncan Sands baldrick at free.fr
Sun Feb 17 05:57:27 PST 2013


Hi Daniel,

> This leads me to suggest  not throwing this info in the TBAA tree, for
> three reasons:
> 1. It's not TBAA.

LLVM's tbaa metadata doesn't really have to have anything to do with types: as
far as I know it's just a general way of attaching this-may[-not]-alias-that
info to memory operations.  TBAA is simply the main user.  That said, the tree
structure (as opposed to a DAG) limits things quite a lot.

Ciao, Duncan.

> 2. You are probably eventually want  to store a bunch of info the rest
> of TBAA does not care about :)
> 3. It gives the frontends freedom to say things about the field
> accesses that relate to whether they can alias, but are not related to
> the types.  For example, maybe all the values of X in f[X] are
> guaranteed to be multiples of 4, and the frontend can prove that.
> Stride info like this is not always/really related to the types
>
>
>
>> And without changing the semantics of LLVM
>> IR, or doing some global range analysis (which you might not be able to do),
>> or inserting runtime checks, I don't see any way to conclude no-alias in the
>> above example besides some TBAA.
>>
>> On Feb 15, 2013, at 6:27 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>>
>> On Fri, Feb 15, 2013 at 12:49 AM, Arnold Schwaighofer
>> <aschwaighofer at apple.com> wrote:
>>>
>>> On Feb 14, 2013, at 10:21 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>>>>
>>>> What I meant by saying "understanding independent objects inside a
>>>> structure" to is that just by looking at an llvm type say for example struct
>>>> { int A[100]; int B[100]} you and seeing two gep accesses that index into
>>>> the first and second field of the type you can not necessarily say that the
>>>> access is not overlapping without fully evaluating the address computation:
>>>
>>>
>>> Yes, you need to fully evaluate it, but that's not really related to TBAA
>>> :)
>>>
>>>
>>>
>>> I was thinking of type trees that encoded fields giving more guarantees
>>> than LLVM IR currently conveys. Admittedly, I have not though this through,
>>> whether this could even remotely work for C.
>>>
>>> In LLVM IR:
>>>
>>> struct { int A[100]; int B[100]} S;
>>>
>>> ptr = gep S, 0, 0, x
>>> ptr2 = gep S, 0, 1, y
>>>
>>> = load ptr, !tbaa !"structS::A"
>>> = load ptr2, !tbaa !"structS::B"
>>>
>>> using this you could tell that ptr and ptr2 do not alias without knowing
>>> about x and y.
>>
>>
>>
>> you've basically just pushed the problem off into the frontend :)
>>
>>
>> Backend developers like to do this ;)
>
> Don't I know it.
>
>>
>> Something needs to know to tag this with structS::A, and not something that
>> combines structS::A and structS::B.
>> For simple structure accesses (IE not through pointers), this is certainly
>> possible.
>>
>>
>> An int pointer load/store would have a TBAA tag of 'int *'  which in my TBAA
>> DAG aliases with structS:A/B so we would give a safe answer (albeit possibly
>> more conservative than necessary).
>>
>>    = *(int*)ptr  // = load ptr, !tbaa !"int *"
>>
>> For any pointer access, it's not possible without pointer analysis.r
>>
>>
>> I am not sure what you mean by this. I believe you can give a correct answer
>> using TBAA DAGs (see farther below).
>
> Yes, I did not mean to imply you cannot give safe answers, only that
> you cannot give ones that are useful to the aliaser for proving
> no-alias.
>
> My apologies for being imprecise.
>>   However, I do understand that you will
>> get a better / less conservative answer in many cases by doing pointer
>> analysis.
>>
>>    int (*ptr)[100] = S.A;
>>    int *ptr2 = ptr;
>>    *ptr2 ; // points-to analysis can tell me that ptr2 points-to S.A but not
>> S.B while TBAA would tell me they both may-alias ptr2
> Yes.
>
>> Note, that our current chained BasicAA and TBAA approach would also catch
>> simple case like the one above without doing full scale global points-to
>> analysis.
>>
>> IE a deference of an int (*[100]) could be an access to structS::A,
>> structS::B, or some other random type.
>>
>> Note that your TBAA DAG below does not take this into account.
>>
>>
>> Yes, indeed I forgot int (*)[]. I admit my DAG was not complete you would
>> also have to have int (*[]), void*, char* - and I am sure I forgot some more
>> (structB::int*, structB::char*, stuctA::structB::int*, etc) - pointing to
>> structS::A.
>>
>> Say you have
>>
>>    *&S.A[0] = val;  // The front-end knows you have a structS::A access.
>>    int *ptr2 = *ptr;   // The front knows that you have a int(*)[] access.
>>
>>
>> The front-end would generate
>>
>>    store sa, val, !tbaa "int (*structS::A)[]"
>>    ptr2 = load %ptr, !tbaa "int(*)[]"
>>
>> and if we use a more complete TBAA type DAG we could conclude that they
>> may-alias because the TBAA DAG tells us:
>>
>>    int (*structS::A)[100] <-   int*, char*, void*, structS*, int(*)[], …?
>>    int (*structS::B)[100] <- /
>
> Yes.
>
>>
>> However, for the access below we could conclude that they don't alias
>> (because structS::A cannot not point to structS::B and vice versa).
>>
>>    foo(struct a f, int i, int y) {
>>       f.A[i] = ;         // store ptr2, !tbaa !"structS::A"
>>               = f.B[y]; // load ptr,    !tbaa !"structS::B"
>>    }
>
> Yes, in these cases, and other direct structure accesses, I agree it would work.
> As I mentioned, it is used  in GCC, but we don't "pollute" the TBAA
> tree with it.
>
>>
>> More examples:
>>
>> struct N{
>>      int a[9];
>>      struct A {
>>          int f[9];
>>          int g[9];
>>          int *ptr;
>>      } ;
>> }
>>
>> in this case an access using a &N.a[], &N.f[] pointer would not alias (point
>> to each other in the TBAA DAG) but a N.ptr integer pointer would point to
>> either in the DAG.
>>
>>>
>>> In the case of pointer loops that iterate over your example type, it will
>>> properly discover that it overruns into the next field, and properly mark it
>>> as pointing-to that field as well.
>>>
>>>
>>> So for the following example it will conservatively say arrayidx and
>>> arrayidx2 alias?
>>>
>>> %struct.anon = type { [256 x i64], i64, [256 x i64] }
>>>
>>> define void @foo(i32 %x, i32 %y) {
>>>    %arrayidx = getelementptr inbounds %struct.anon* @Foo, i32 0, i32 0, i32
>>> %x
>>>    %0 = store i64* %arrayidx, align 8
>>>    %arrayidx2 = getelementptr inbounds %struct.anon* @Foo, i32 0, i32 2,
>>> i32 %y
>>>    %1 = store i64* %arrayidx2, align 8
>>> }
>>
>>
>>
>> Yes, when converted to may-alias info, it will.
>>
>>
>> Okay.
>>
>> When I hear fields I think types. When I think types, I assume that *it is
>> assumed* (I made this mistake the first time I looked at BasicAA) that you
>> can treat the type as a guarantee about certain properties including things
>> like that you are not allowed to walk beyond array bounds which would imply
>> that you can treat an access over type of structs/arrays like a path in a
>> tree (I wondered why we don't do this the first time I looked at BasicAA -
>> until I dug deeper).
>>
>> My first (wrong) intuition was the set of possible accessible locations in
>> the following two geps is disjunct
>>    = getelementptr inbounds %struct.anon* @Foo, i32 0, i32 0, ...
>>    = getelementptr inbounds %struct.anon* @Foo, i32 0, i32 1, …
>>
>> Hence, my initial warning about "understanding independent objects inside a
>> structure".
>>
>> I am seeing TBAA as a way to give me similar guarantees on top of the
>> existing semantics of the IR.
>>
>>
>>
>>  From what I read (it's been quite a while) t's actually not valid in
>> C, but valid in LLVM IR.
>> At least, as written.
>>
>> §6.5.2.1, paragraph 2:
>>
>> A postfix expression followed by an expression in square brackets []
>> is a subscripted designation of an element of an array object. The
>> definition of the subscript operator [] is that E1[E2] is identical to
>> (*((E1)+(E2))). Because of the conversion rules that apply to the
>> binary + operator, if E1 is an array object (equivalently, a pointer
>> to the initial element of an array object) and E2 is an integer,
>> E1[E2] designates the E2-th element of E1 (counting from zero).
>>
>> So it's going to be a pointer plus an address.
>>
>> §6.5.6, paragraph 8:
>>
>> When an expression that has integer type is added to or subtracted
>> from a pointer, the result has the type of the pointer operand. If the
>> pointer operand points to an element of an array object, and the array
>> is large enough, the result points to an element offset from the
>> original element such that the difference of the subscripts of the
>> resulting and original array elements equals the integer expression.
>> In other words, if the expression P points to the i-th element of an
>> array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N
>> (where N has the value n) point to, respectively, the i+n-th and
>> i-n-th elements of the array object, provided they exist. Moreover, if
>> the expression P points to the last element of an array object, the
>> expression (P)+1 points one past the last element of the array object,
>> and if the expression Q points one past the last element of an array
>> object, the expression (Q)-1 points to the last element of the array
>> object. If both the pointer operand and the result point to elements
>> of the same array object, or one past the last element of the array
>> object, the evaluation shall not produce an overflow; otherwise, the
>> behavior is undefined. If the result points one past the last element
>> of the array object, it shall not be used as the operand of a unary *
>> operator that is evaluated.
>>
>>
>> you fall into the "otherwise, the behavior is undefined".
>>
>> It would be explicitly valid to convert it to char *, and read it up
>> to and through B, but not using the array access operator.
>>
>>
>> Yes, I vaguely remember reading something like this just was not 100% sure
>> anymore. Thanks for looking it up.
>>
>> IE char * foo = &<beginning struct containing A and B>
>>
>> <walk through A, get to B>
>>
>>
>>
>> Thanks,
>> Arnold
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>




More information about the llvm-dev mailing list