[LLVMdev] PROPOSAL: struct-access-path aware TBAA
Daniel Berlin
dberlin at dberlin.org
Mon Mar 11 13:17:21 PDT 2013
On Mon, Mar 11, 2013 at 11:41 AM, Manman Ren <mren at apple.com> wrote:
> Based on discussions with John McCall
>
> We currently focus on field accesses of structs, more specifically, on fields that are scalars or structs.
>
> Fundamental rules from C11
> --------------------------
> An object shall have its stored value accessed only by an lvalue expression that has one of the following types: [footnote: The intent of this list is to specify those circumstances in which an object may or may not be aliased.]
> 1. a type compatible with the effective type of the object,
> 2. a qualified version of a type compatible with the effective type of the object,
> 3. a type that is the signed or unsigned type corresponding to the effective type of the object,
> 4. a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
> 5. an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
> 6. a character type.
>
> Example
> -------
> struct A {
> int x;
> int y;
> };
> struct B {
> A a;
> int z;
> };
> struct C {
> B b1;
> B b2;
> int *p;
> };
>
> Type DAG:
> int <- A::x <- A
> int <- A::y <- A <- B::a <- B <- C::b1 <- C
> int <----------------- B::z <- B <- C::b2 <- C
> any pointer <--------------------- C::p <- C
>
> The type DAG has two types of TBAA nodes:
> 1> the existing scalar nodes
> 2> the struct nodes (this is different from the current tbaa.struct)
> A struct node has a unique name plus a list of pairs (field name, field type).
> For example, struct node for "C" should look like
> !4 = metadata !{"C", "C::b1", metadata !3, "C::b2", metadata !3, "C::p", metadata !2}
> where !3 is the struct node for "B", !2 is pointer type.
>
> Given a field access
> struct B *bp = ...;
> bp->a.x = 5;
> we annotate it as B::a.x.
In the case of multiple structures containing substructures, how are
you differentiating?
IE given
struct A {
struct B b;
}
struct C {
struct B b;
}
How do you know the above struct B *bp =...; is B::b from C and not B::b from A?
(I agree you can know in the case of direct aggregates, but I argue
you have no way to know in the case of pointer arguments without
interprocedural analysis)
It gets worse the more levels you have.
ie if you add
struct B {
struct E e;
}
and have struct E *e = ...
how do you know it's the B::e contained in struct C, or the B::e
contained in struct A?
Again, i agree you can do both scalar and direct aggregates, but not
aggregates and scalars through pointers.
> Implementing the Hierarchy
> --------------------------
> We can attach metadata to both scalar accesses and aggregate accesses. Let's call scalar tags and aggregate tags.
> Each tag can be a sequence of nodes in the type DAG.
> !C::b2.a.x := [ "tbaa.path", !C, "C::b2", !B, "B::a", !A, "A::x", !int ]
>
This can get quite deep quite quickly.
Are there actually cases where knowing the outermost aggregate type +
{byte offset, size} of field access does not give you the exact same
disambiguation capability?
I ask because in GCC, we determined the answer was "no", since
pointers require special analysis anyway.
More information about the llvm-dev
mailing list