[LLVMdev] PROPOSAL: struct-access-path aware TBAA

Wed Mar 13 13:49:41 PDT 2013

On 3/13/13 1:21 PM, Daniel Berlin wrote:
> On Wed, Mar 13, 2013 at 11:37 AM, Arnold Schwaighofer
> <aschwaighofer at apple.com> wrote:
>> On Mar 13, 2013, at 1:07 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>>
>>>> The program I gave was well typed :)
>>> Hi, Daniel:
>>>    Thank you for sharing your insight.  I didn't realized it is well-typed -- I'm basically a big nut of any std.
>>> I'd admit std/spec is one of the most boring material on this planet:-).
>>>
>>>    So, if I understand correct, your point is:
>>>        if a std call a type-casting (could be one which is in bad-taste:-), TBAA has to respect such std.
>>>
>>>   If that is strictly true, TBAA has to reply on point-to analysis. However, that would virtually disable
>>> TBAA as most point-to set has "unknown" element.
>>>
>>>    Going back to my previous mail,
>>>> In the below example, GCC assumes p and q point to anything because
>>>> they are incoming arguments.
>>>>
>>>>> ------------------------------
>>>>> typedef struct {
>>>>>      int x;
>>>>> }T1;
>>>>>
>>>>> typedef struct {
>>>>>      int y;
>>>>> }T2;
>>>>>
>>>>> int foo(T1 *p, T2 *q) {
>>>>>      p->x = 1;
>>>>>      q->y = 4;
>>>>>      return p->x;
>>>>> }
>>>>> --------------------------
>>> Yes, gcc should assume p and q point to anything, however, the result contradict to the assumption --
>>> It promote the p->x expression.
>>
>> Assuming above is C11 code, I think the relevant section in the C spec is the following:
>>
>> This is a paragraph from a C11 draft ("N1570 Committee Draft — April 12, 2011") . Assuming my interpretation of it is correct: It seems to imply that a store to an lvalue can change its subsequent effective type? This would preclude any purely based TBAA solution. And would, in general, require to take access/points-to information into account.
>>
>> ---
>> 6.5 Expressions
>>
>> 6: "The effective type of an object for an access to its stored value is the declared type of the object, if any. If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access."
>> ---
>>
>> This is just before paragraph 6.5 Expressions 7 that is quoted in the current TBAA proposal.
>>
>>   "If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that <<do not modify>> the stored value."
>>
>> I read this as "A store will set the "effective type" for any subsequent read access" on the same object. So, in the above example, assuming
>> that p and q point to the same object, the effective type is changed from the first to the second line. Which means that IF p and q pointed to the > same object the read access to "p->x" using the old effective type is undefined. Hence, we may assume that p and q don't point to the same
>> object.
> Yes, C is quite different than C++ here.
>
> GCC will feel free to move these particular stores around, even though
> it believes they point anywhere, but won't in my placement new C++
> case, because they *must* point to the same memory.

For this specific case, I actually tried both g++ and gcc last night,  
there is no difference.

I'm wondering why the "dynamic type" would help to make TBAA 100% safe 
and also helpful.

  Suppose point-to set for the two mem-access are pt1 and pt2,
dyn-type1 = union-of-all-type-of-the-element-in-pt1,
dyn-type2 = union-of-all-type-of-all-element-in-ptr2.

If dyn-type1 and dyn-type2 are disjointed, then pt1 and pt2 must be 
disjoint, which means the point-to
analysis already prove these two memory access are not alias. We don't 
need TBAA at all.

If "dynamic type" is just a kind of incremental safeness enhancement
(instead of guarantee the correctness),  it is not very hard
to add such enhancement as well: just walk the IR top-down, starting 
from some bad type-cast etc,
invalidate the metadata annotated to the mem-access which use the 
pointer that are badly type-casted
from other places.

Also, I'm wondering if the wrapper the placement new operator into a 
function, will gcc still compile
the code correctly?

>
>
>
>> I don't know whether that reasoning underlies the decision that GCC makes but it would be a justification (assuming my reasoning above is correct).
>
>
>>
>> WRT to the current TBAA proposal this means that we have to be aware if we decide on a purely type/access path based solution we might be breaking a lot more code than we do now.
>>
>> Best,
>> Arnold
>>
>>
>>
>>
>>