[LLVMdev] PROPOSAL: struct-access-path aware TBAA

Daniel Berlin dberlin at dberlin.org
Tue Mar 12 23:09:57 PDT 2013


Someone privately asked me to explain this example, so here goes ...

> There are simpler examples of this kind for  C++, because placement
> new can change the dynamic type of the object (I actually haven't
> looked to see if they changed this in 2012, but it was definitely
> legal in C++98):
>
> #include <new>
> struct Foo { long i; };
> struct Bar { void *p; };
> long foo(int n)
> {
>   Foo *f = new Foo;
>   f->i = 1;
>   for (int i=0; i<n; ++i)
>     {
>       Bar *b = new (f) Bar;
>       b->p = 0;
>       f = new (f) Foo;
>       f->i = i;
>     }
>   return f->i;
> }
>
> Both access to the same memory, both will end up with completely
> different access paths, both legal by TBAA rules, but access path
> alone will claim no-alias.
>
> (This is taken from http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29286)


At the language level, both f and b are not the same "object".  The
lifetime of b ends with the f = new (f) Foo.
This is actually irrelevant at the IR level, however.

What this transforms into at the IR level is something like this.
I've elided the fact that it will  really start out one level
indirected, and then get promoted to regs, and pretended we have a
load/store at offset instead of getelementptr.


 #include <new>
 struct Foo { long i; };
 struct Bar { void *p; };
 long foo(int n)
 {
   Foo *f1 = call operator new
   store value 1 into offset 0 of f

   for (int i=0; i<n; ++i)
     {
       Foo *f4 = phi(f1, f2)
       Bar *b = cast f4 to type Bar *
       store value 0 into offset 0 of b
       Foo *f2 = cast b to type Foo *
       store value i into offset of f2
    }
   Foo *f3 = phi(f1, f2)
 temp =  load offset 0 of f3
 return temp;
 }

If you annotate the loads/stores with access paths, these paths will
say "no alias" if you ask if the stores alias.
Thus, you will feel free to reorder the stores.
If you reorder the loop stores, you will return 0 instead of n.
The language guarantees the function to return n.

The cause of this is simple: C++ provides a way to legally and
dynamically change the TBAA type of a pointer.  Without evaluating the
placement new's, you can't know what that new type is.  You also need
what the thing you placement new'd pointed to to know what now legally
shares memory, despite TBAA, and even if the objects in that memory do
not share the same lifetime.

At the IR level, it's all just memory, loads and stores, and if you
are asked "do these pointers alias", the answer is "yes", and if you
annotate them with TBAA paths, the paths, as described, this will
cause you to say "no".

FWIW: GCC does three things now in this case:
1. These are transformed into "change dynamic type expressions"
2. For the higher level IR, it unions the type's TBAA info and the
location's points-to sets when it sees a "change dynamic type
expression"
3. For the lower level, where this won't work, we assume the pointer
can alias anything)



More information about the llvm-dev mailing list