[LLVMdev] PROPOSAL: struct-access-path aware TBAA

Wed Mar 13 14:01:13 PDT 2013

On Wed, Mar 13, 2013 at 1:49 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>
> On 3/13/13 1:21 PM, Daniel Berlin wrote:
>>
>> On Wed, Mar 13, 2013 at 11:37 AM, Arnold Schwaighofer
>> <aschwaighofer at apple.com> wrote:
>>>
>>> On Mar 13, 2013, at 1:07 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>>>
>>>>> The program I gave was well typed :)
>>>>
>>>> Hi, Daniel:
>>>>    Thank you for sharing your insight.  I didn't realized it is
>>>> well-typed -- I'm basically a big nut of any std.
>>>> I'd admit std/spec is one of the most boring material on this planet:-).
>>>>
>>>>    So, if I understand correct, your point is:
>>>>        if a std call a type-casting (could be one which is in
>>>> bad-taste:-), TBAA has to respect such std.
>>>>
>>>>   If that is strictly true, TBAA has to reply on point-to analysis.
>>>> However, that would virtually disable
>>>> TBAA as most point-to set has "unknown" element.
>>>>
>>>>    Going back to my previous mail,
>>>>>
>>>>> In the below example, GCC assumes p and q point to anything because
>>>>> they are incoming arguments.
>>>>>
>>>>>> ------------------------------
>>>>>> typedef struct {
>>>>>>      int x;
>>>>>> }T1;
>>>>>>
>>>>>> typedef struct {
>>>>>>      int y;
>>>>>> }T2;
>>>>>>
>>>>>> int foo(T1 *p, T2 *q) {
>>>>>>      p->x = 1;
>>>>>>      q->y = 4;
>>>>>>      return p->x;
>>>>>> }
>>>>>> --------------------------
>>>>
>>>> Yes, gcc should assume p and q point to anything, however, the result
>>>> contradict to the assumption --
>>>> It promote the p->x expression.
>>>
>>>
>>> Assuming above is C11 code, I think the relevant section in the C spec is
>>> the following:
>>>
>>> This is a paragraph from a C11 draft ("N1570 Committee Draft — April 12,
>>> 2011") . Assuming my interpretation of it is correct: It seems to imply that
>>> a store to an lvalue can change its subsequent effective type? This would
>>> preclude any purely based TBAA solution. And would, in general, require to
>>> take access/points-to information into account.
>>>
>>> ---
>>> 6.5 Expressions
>>>
>>> 6: "The effective type of an object for an access to its stored value is
>>> the declared type of the object, if any. If a value is stored into an object
>>> having no declared type through an lvalue having a type that is not a
>>> character type, then the type of the lvalue becomes the effective type of
>>> the object for that access and for subsequent accesses that do not modify
>>> the stored value. If a value is copied into an object having no declared
>>> type using memcpy or memmove, or is copied as an array of character type,
>>> then the effective type of the modified object for that access and for
>>> subsequent accesses that do not modify the value is the effective type of
>>> the object from which the value is copied, if it has one. For all other
>>> accesses to an object having no declared type, the effective type of the
>>> object is simply the type of the lvalue used for the access."
>>> ---
>>>
>>> This is just before paragraph 6.5 Expressions 7 that is quoted in the
>>> current TBAA proposal.
>>>
>>>   "If a value is stored into an object having no declared type through an
>>> lvalue having a type that is not a character type, then the type of the
>>> lvalue becomes the effective type of the object for that access and for
>>> subsequent accesses that <<do not modify>> the stored value."
>>>
>>> I read this as "A store will set the "effective type" for any subsequent
>>> read access" on the same object. So, in the above example, assuming
>>> that p and q point to the same object, the effective type is changed from
>>> the first to the second line. Which means that IF p and q pointed to the >
>>> same object the read access to "p->x" using the old effective type is
>>> undefined. Hence, we may assume that p and q don't point to the same
>>> object.
>>
>> Yes, C is quite different than C++ here.
>>
>> GCC will feel free to move these particular stores around, even though
>> it believes they point anywhere, but won't in my placement new C++
>> case, because they *must* point to the same memory.
>
>
> For this specific case, I actually tried both g++ and gcc last night,  there
> is no difference.

The placement new case i cited should keep both around. If it's not,
i'd love to know :)
It's in the G++ testsuite.

>
> I'm wondering why the "dynamic type" would help to make TBAA 100% safe and
> also helpful.

It's just a case you need to punt on *TBAA*, not give up on memory aliasing.
If

>
>  Suppose point-to set for the two mem-access are pt1 and pt2,
> dyn-type1 = union-of-all-type-of-the-element-in-pt1,
> dyn-type2 = union-of-all-type-of-all-element-in-ptr2.
>
> If dyn-type1 and dyn-type2 are disjointed, then pt1 and pt2 must be
> disjoint, which means the point-to
> analysis already prove these two memory access are not alias. We don't need
> TBAA at all.
Correct. This is only about where it's legal to use TBAA, not about
where it's legal to disambiguate aliasing :)

> If "dynamic type" is just a kind of incremental safeness enhancement
> (instead of guarantee the correctness),  it is not very hard
> to add such enhancement as well: just walk the IR top-down, starting from
> some bad type-cast etc,
> invalidate the metadata annotated to the mem-access which use the pointer
> that are badly type-casted
> from other places.

Yes, this is what GCC does, we propagate around the points-to-anything
and can-alias-anything bits
(these are separate bits)

>
> Also, I'm wondering if the wrapper the placement new operator into a
> function, will gcc still compile
> the code correctly?

Yes, but only by accident.  :)