[LLVMdev] PROPOSAL: struct-access-path aware TBAA

Mon Mar 11 11:41:42 PDT 2013

Based on discussions with John McCall

We currently focus on field accesses of structs, more specifically, on fields that are scalars or structs.

Fundamental rules from C11
--------------------------
An object shall have its stored value accessed only by an lvalue expression that has one of the following types: [footnote: The intent of this list is to specify those circumstances in which an object may or may not be aliased.]
1. a type compatible with the effective type of the object,
2. a qualified version of a type compatible with the effective type of the object,
3. a type that is the signed or unsigned type corresponding to the effective type of the object,
4. a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
5. an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
6. a character type.

Example   
-------  
  struct A {
    int x;
    int y;
  };
  struct B { 
    A a;
    int z;
  };
  struct C {
    B b1;
    B b2; 
    int *p;
  };

Type DAG:
  int <- A::x <- A  
  int <- A::y <- A <- B::a <- B <- C::b1 <- C
  int <----------------- B::z <- B <- C::b2 <- C
  any pointer <--------------------- C::p  <- C 

The type DAG has two types of TBAA nodes:
1> the existing scalar nodes
2> the struct nodes (this is different from the current tbaa.struct)
   A struct node has a unique name plus a list of pairs (field name, field type).
   For example, struct node for "C" should look like
   !4 = metadata !{"C", "C::b1", metadata !3, "C::b2", metadata !3, "C::p", metadata !2}
   where !3 is the struct node for "B", !2 is pointer type.

Given a field access
  struct B *bp = ...;
  bp->a.x = 5;
we annotate it as B::a.x.
This creates an obvious hierarchy of specificity:
  'B::a.x'
    |-> 'A::x'
      |-> 'int'
Each step outwards adds extra contextual information for an access to an 'int' object.
We call each step inwards the super-type relation. It peels a field access from the left.

There exists a second hierarchy, that of containment:
  'B::a.x'
    |-> 'B::a'
      |-> 'B'
Each step outwards "extends" the object into a subobject. Each step inwards "retracts" the object into an container.
"retraction" peels a field access from the right; "extension" appends a field access at the right.

Aliasing Rules
--------------
alias(x,y) = rule(x,y) or rule(y,x)
rule(x,y) has 3 cases:
I> 'x' and 'y' do not have a common struct type
  rule_case1(x,y) returns true if the last element of 'x' can reach 'y' via type DAG
  |  x  |
         extends
  |    z         |
      super-type
           |  y  |
II> 'x' and 'y' have a common struct type
  The field access starting from the first common type should match until the end of 'x' or 'y'.
  case a:
  |  x   |
     super-type
      |z'|
        extends
      |   y      |
  case b:
  |  x            |
        retracts
  |   z     |
      super-type
      |  y  |

Combine the three cases to get rule(x,y):
Check the first element of 'y', if it is not in 'x', return rule_case1(x,y)
Check the next element of 'y' with the next element of 'x', if not the same, return false.
When we reach the end of either 'x' or 'y', return true.

If we only care about alias between scalar accesses, the above rules can be simplified.
alias(scalar access x, scalar access y) = super_type(x,y) or super_type(y,x)
where super_type(x,y) is true if 'x' can reach 'y' via super-type relation.

Implementing the Hierarchy
--------------------------
We can attach metadata to both scalar accesses and aggregate accesses. Let's call scalar tags and aggregate tags.
Each tag can be a sequence of nodes in the type DAG.
!C::b2.a.x := [ "tbaa.path", !C, "C::b2", !B, "B::a", !A, "A::x", !int ]

Comments and questions are welcome.

Manman