[LLVMdev] PROPOSAL: struct-access-path aware TBAA

Mon Mar 11 14:06:39 PDT 2013

On Mar 11, 2013, at 1:17 PM, Daniel Berlin <dberlin at dberlin.org> wrote:

> On Mon, Mar 11, 2013 at 11:41 AM, Manman Ren <mren at apple.com> wrote:
>> Based on discussions with John McCall
>> 
>> We currently focus on field accesses of structs, more specifically, on fields that are scalars or structs.
>> 
>> Fundamental rules from C11
>> --------------------------
>> An object shall have its stored value accessed only by an lvalue expression that has one of the following types: [footnote: The intent of this list is to specify those circumstances in which an object may or may not be aliased.]
>> 1. a type compatible with the effective type of the object,
>> 2. a qualified version of a type compatible with the effective type of the object,
>> 3. a type that is the signed or unsigned type corresponding to the effective type of the object,
>> 4. a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
>> 5. an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
>> 6. a character type.
>> 
> 
>> Example
>> -------
>>  struct A {
>>    int x;
>>    int y;
>>  };
>>  struct B {
>>    A a;
>>    int z;
>>  };
>>  struct C {
>>    B b1;
>>    B b2;
>>    int *p;
>>  };
>> 
>> Type DAG:
>>  int <- A::x <- A
>>  int <- A::y <- A <- B::a <- B <- C::b1 <- C
>>  int <----------------- B::z <- B <- C::b2 <- C
>>  any pointer <--------------------- C::p  <- C
>> 
>> The type DAG has two types of TBAA nodes:
>> 1> the existing scalar nodes
>> 2> the struct nodes (this is different from the current tbaa.struct)
>>   A struct node has a unique name plus a list of pairs (field name, field type).
>>   For example, struct node for "C" should look like
>>   !4 = metadata !{"C", "C::b1", metadata !3, "C::b2", metadata !3, "C::p", metadata !2}
>>   where !3 is the struct node for "B", !2 is pointer type.
>> 
>> Given a field access
>>  struct B *bp = ...;
>>  bp->a.x = 5;
>> we annotate it as B::a.x.
> 
> In the case of multiple structures containing substructures, how are
> you differentiating?
> 
> IE given
> 
> struct A {
> struct B b;
> }
> struct C {
> struct B b;
> }
> 
> How do you know the above struct B *bp =...; is B::b from C and not B::b from A?
> 
> (I agree you can know in the case of direct aggregates, but I argue
> you have no way to know in the case of pointer arguments without
> interprocedural analysis) 
> It gets worse the more levels you have.
> 
> ie if you add
> struct B {
> struct E e;
> }
> 
> and have struct E *e = ...
> how do you know it's the B::e contained in struct C, or the B::e
> contained in struct A?
> 
> 
> Again, i agree you can do both scalar and direct aggregates, but not
> aggregates and scalars through pointers.
> 

I don't have immediate plan of pointer analysis. For the given example, we will treat field accesses from bp (pointer to struct B) as B::x.x, bp can be from either C or A.

> 
>> Implementing the Hierarchy
>> --------------------------
>> We can attach metadata to both scalar accesses and aggregate accesses. Let's call scalar tags and aggregate tags.
>> Each tag can be a sequence of nodes in the type DAG.
>> !C::b2.a.x := [ "tbaa.path", !C, "C::b2", !B, "B::a", !A, "A::x", !int ]
>> 
> 
> This can get quite deep quite quickly.
> Are there actually cases where knowing the outermost aggregate type +
> {byte offset, size} of field access does not give you the exact same
> disambiguation capability?
The answer is no. We should be able to deduce the access path from {byte offset, size}.
However, I don't know an easy way to check alias(x,y) given {byte offset, size} for x and y.

Thanks,
Manman

> 
> I ask because in GCC, we determined the answer was "no", since
> pointers require special analysis anyway.