[LLVMdev] PROPOSAL: struct-access-path aware TBAA

Shuxin Yang shuxin.llvm at gmail.com
Tue Mar 12 23:59:20 PDT 2013


I personally believe that in the context of type-based AA, correctness 
is a subjective term:-).

If the user smell something fishy, it is up to user to disable such opt, 
there is no other
way around.  TBAA is just to find the a sweet spot between precision & 
safeness.
Unfortunately, in the context of TBAA, precision & safeness usually come 
at each other's expense...

It would  be nice if we can union the dynamic types of each elements in 
point-to set. However,
we certainly can live without if the program being compiled is well 
typed. I don't think it is
a dispensable tool for this kind of opt.

For some nasty license issue, I'm not allowed to peruse and discuss gcc 
internals.
I'm just using the gcc 4.6.3 binary shipped with Ubuntu to compile 
following snippet. I'm wondering
why the p->x is promoted. In the light of "dynamic type",
shouldn't p and q's point-to sets are "everything whose addr is taken".


------------------------------
typedef struct {
     int x;
}T1;

typedef struct {
     int y;
}T2;

int foo(T1 *p, T2 *q) {
     p->x = 1;
     q->y = 4;
     return p->x;
}
--------------------------



On 03/12/2013 11:09 PM, Daniel Berlin wrote:
> Someone privately asked me to explain this example, so here goes ...
>
>> There are simpler examples of this kind for  C++, because placement
>> new can change the dynamic type of the object (I actually haven't
>> looked to see if they changed this in 2012, but it was definitely
>> legal in C++98):
>>
>> #include <new>
>> struct Foo { long i; };
>> struct Bar { void *p; };
>> long foo(int n)
>> {
>>    Foo *f = new Foo;
>>    f->i = 1;
>>    for (int i=0; i<n; ++i)
>>      {
>>        Bar *b = new (f) Bar;
>>        b->p = 0;
>>        f = new (f) Foo;
>>        f->i = i;
>>      }
>>    return f->i;
>> }
>>
>> Both access to the same memory, both will end up with completely
>> different access paths, both legal by TBAA rules, but access path
>> alone will claim no-alias.
>>
>> (This is taken from http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29286)
>
> At the language level, both f and b are not the same "object".  The
> lifetime of b ends with the f = new (f) Foo.
> This is actually irrelevant at the IR level, however.
>
> What this transforms into at the IR level is something like this.
> I've elided the fact that it will  really start out one level
> indirected, and then get promoted to regs, and pretended we have a
> load/store at offset instead of getelementptr.
>
>
>   #include <new>
>   struct Foo { long i; };
>   struct Bar { void *p; };
>   long foo(int n)
>   {
>     Foo *f1 = call operator new
>     store value 1 into offset 0 of f
>
>     for (int i=0; i<n; ++i)
>       {
>         Foo *f4 = phi(f1, f2)
>         Bar *b = cast f4 to type Bar *
>         store value 0 into offset 0 of b
>         Foo *f2 = cast b to type Foo *
>         store value i into offset of f2
>      }
>     Foo *f3 = phi(f1, f2)
>   temp =  load offset 0 of f3
>   return temp;
>   }
>
> If you annotate the loads/stores with access paths, these paths will
> say "no alias" if you ask if the stores alias.
> Thus, you will feel free to reorder the stores.
> If you reorder the loop stores, you will return 0 instead of n.
> The language guarantees the function to return n.
>
> The cause of this is simple: C++ provides a way to legally and
> dynamically change the TBAA type of a pointer.  Without evaluating the
> placement new's, you can't know what that new type is.  You also need
> what the thing you placement new'd pointed to to know what now legally
> shares memory, despite TBAA, and even if the objects in that memory do
> not share the same lifetime.
>
> At the IR level, it's all just memory, loads and stores, and if you
> are asked "do these pointers alias", the answer is "yes", and if you
> annotate them with TBAA paths, the paths, as described, this will
> cause you to say "no".
>
> FWIW: GCC does three things now in this case:
> 1. These are transformed into "change dynamic type expressions"
> 2. For the higher level IR, it unions the type's TBAA info and the
> location's points-to sets when it sees a "change dynamic type
> expression"
> 3. For the lower level, where this won't work, we assume the pointer
> can alias anything)
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev




More information about the llvm-dev mailing list