[LLVMdev] PROPOSAL: struct-access-path aware TBAA (new version)

Manman Ren mren at apple.com
Wed Mar 27 13:11:16 PDT 2013


Hello,

After discussions with Daniel, Dan and others, here is an updated proposal for struct-access-path aware TBAA.

Given an example
struct A {
  int x;
  int y;
};
struct B {
  A a;
  int z;
};
struct C {
  B b1;
  B b2;
  int *p;
};
struct D {
  C c;
};

The purpose of struct-path-aware TBAA is to say
"C::b1.a" will alias with "B::a.x", "C::b1.a" will alias with "B",
"C::b1.a" does not alias with "D::c.b2.a.x".
"A::x" does not alias with "A::y"
We can potentially disambiguate aggregate accesses and scalar accesses.

The current scalar TBAA will say
"C::b1.a.x" will alias with "D::c.b2.a.x" since both are accesses to int.
"A::x" will alias with "A::y"

This proposal is about the format of metadata and how to implement alias queries.
--------- Option A
A> metadata for the type DAG
   1> the existing scalar nodes
      !0 = metadata !{metadata !"any pointer", metadata !1}
      !1 = metadata !{metadata !"omnipotent char", metadata !2}
      !2 = metadata !{metadata !"Simple C/C++ TBAA"}
      !4 = metadata !{metadata !"int", metadata !1}
   2> the struct nodes
      A struct node has a unique name plus a list of field types
      For example, struct node for "C" should look like
      !5 = metadata !{"C", metadata !3, metadata !3, metadata !0}
      where !3 is the struct node for "B"

   The type DAG for the example:
   int <------------- A <- B <-- C <-- D
   int <-------------------- B
   any pointer <--------------- C

B> We call metadata attached to scalar accesses scalar tags and metadata
   attached to aggregate accesses aggregate tags.
   Each tag can be a sequence of field accesses and nodes in the type DAG.
   For example, the tag for "C::b2.a.x" is
   !7 := [ "tbaa.path", !C, "C::b2", !B, "B::a", !A, "A::x", !int ] ;C::b2.a.x
   where !C, !B, !A are metadata nodes in the type DAG, "C::b2" "B::a" "A::x" are strings.
C> alias(x, y) = alias_rule(x, y) or alias_rule(y, x), where alias_rule(x, y) is
   if the first element of 'x' does not enclose the first element of 'y'
     return NoAlias
   if the last element of 'x' encloses the first element of 'y' via type DAG,
     return Alias
   Check the first element of 'y', if it is not in 'x', return NoAlias
   Repeat until we reach the end of either 'x' or 'y'
     Check the next element of 'y' with the next element of 'x'
     if not the same, return false.

-------------- Option B
Option B is to include offset of each field in the type DAG and let tag
be offset based.
A> metadata for the type DAG
   1> the existing scalar nodes
   2> the struct nodes
      A struct node has a unique name plus a list of pairs (offset, field type)
      For example, struct node for "C" should look like
      !5 = metadata !{"C", 0, metadata !3, 12, metadata !3, 24, metadata !0}
      where !3 is the struct node for "B"
   The type DAG for the example:
   int <- 0 <- A
   int <- 4 <- A <- 0 <- B <- 0  <- C <- 0 <- D
   int <-------------- 8 <- B <- 12 <- C
   any pointer <--------------- 24 <- C

B> The tag will be (outermost struct type node, offset, size, the last type node)
   For example, the tag for C::b2.a.x will be
   !7 = [ "tbaa.offset", !C, 12, 4, !int] ;C::b2.a.x
   The last field of the tag is to handle alias(D::c, A::x), the tags will be
   [!D, 0, 32, !C] and [!A, 0, 4, !int], we only need to check whether !C encloses
   !A.

C> Consider access pair "C::b1.a.x" and "D::c.b2.a.x", the tags are [!C, 0, 4, !int]
   and [!D, 12, 4, !int].
   To check if they alias, we first check if !D encloses !C. If no, we return
   NoAlias. Since !D does enclose !C in our example, we start tracing from !D
   and follow the edge with the correct offset by
   adjusting the offset to be relative to the next node. We will reach C in the
   type DAG, now we have a common type node and can compare the (offset, size)
   directly. [!D, 12, 4] will be adjusted to [!C, 12, 4], since it does not
   overlap with [!C, 0, 4], we say they do not alias.

   alias_rule(x, y) is
   if the outermost type of 'x' does not enclose the outermost type of 'y',
     return NoAlias
   if the last type of 'x' encloses the outermost type of 'y'
     return Alias
   Start from the outermost type of 'x'
   Repeat until we reach the outermost type of 'y'
     follow the edge with the correct offset in the type DAG
     adjust the offset to be relative to the next node
   Compare the adjusted offset, size pair for 'x' against 'y'
   if they overlap, return Alias
   Otherwise, return NoAlias

------------------ Comparison
The complexity of alias queries: similar
The space for metadata: option A should require more space because the tag is a
  sequence of nodes and strings.
I will work on Option B.

------------------ Holes
We use BasicAA to perform sanity check. If BasicAA returns MayAlias, TBAA will
kick off with full speed.

Watch out for cases where the incoming arguments have different types but they
do alias. Scalar TBAA has the same issue but we don't see bugs coming up. If
we see bugs coming up with struct-path aware TBAA, we can do less aggressive
TBAA or apply similar tricks as GCC.

Scalar TBAA will say NoAlias for the placement new example under
test/Analysis/TypeBasedAliasAnalysis/placement-tbaa.ll.
If we optimize placement-tbaa.ll with -O1 -gvn, BaiscAA will start saying
MustAlias. This may cause problem.




More information about the llvm-dev mailing list