[PATCH] D32199: [TBAASan] A TBAA Sanitizer (Clang)

Mon May 1 16:17:55 PDT 2017

On 05/01/2017 02:35 PM, Krzysztof Parzyszek via cfe-commits wrote:
> On 5/1/2017 2:16 PM, Hal Finkel via cfe-commits wrote:
>>
>> On 05/01/2017 12:49 PM, Daniel Berlin wrote:
>>> On 04/21/2017 06:03 AM, Hal Finkel via Phabricator wrote:
>>>> ...
>>>>
>>>>
>>>>    Our struct-path TBAA does the following:
>>>>
>>>>       struct X { int a, b; };
>>>>       X x { 50, 100 };
>>>>       X *o = (X*) (((int*) &x) + 1);
>>>>
>>>>       int a_is_b = o->a; // This is UB (or so we say)?
>>>>
>>>
>>> This is UB.
>>> A good resource for this stuff is 
>>> http://www.cl.cam.ac.uk/~pes20/cerberus/ 
>>> <http://www.cl.cam.ac.uk/%7Epes20/cerberus/> which has a long 
>>> document where they exlpore all of these and what various compilers 
>>> do, along with what the standard seems to say.
>>
>> http://www.cl.cam.ac.uk/~pes20/cerberus/notes30-full.pdf is 172 
>> pages, and so I may have missed it, but I don't see this case. Also, 
>> I'd really like to see where the standard says this is UB. I don't 
>> see it.
>>
>
> The last sentence of 8:
>
>
> 6.5.6 Additive operators
>
>
> 7 For the purposes of these operators, a pointer to an object that is 
> not an element of an array behaves the same as a pointer to the first 
> element of an array of length one with the type of the object as its 
> element type.
>
> 8 When an expression that has integer type is added to or subtracted 
> from a pointer, the result has the type of the pointer operand. If the 
> pointer operand points to an element of an array object, and the array 
> is large enough, the result points to an element offset from the 
> original element such that the difference of the subscripts of the 
> resulting and original array elements equals the integer expression. 
> In other words, if the expression P points to the i-th element of an 
> array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N 
> (where N has the value n) point to, respectively, the i+n-th and 
> i−n-th elements of the array object, provided they exist. Moreover, if 
> the expression P points to the last element of an array object, the 
> expression (P)+1 points one past the last element of the array object, 
> and if the expression Q points one past the last element of an array 
> object, the expression (Q)-1 points to the last element of the array 
> object. If both the pointer operand and the result point to elements 
> of the same array object, or one past the last element of the array 
> object, the evaluation shall not produce an overflow; otherwise, the 
> behavior is undefined. If the result points one past the last element 
> of the array object, it shall not be used as the operand of a unary * 
> operator that is evaluated.

I certainly see your point, but I'm not sure it helps. It is true that 
((int*) &x), not being a pointer to an array objected, when used as one, 
is an array of length one. Thus, forming (((int*) &x) + 1) is valid, 
being a one-past-the-end pointer, but cannot be used as the operand of a 
unary * that is evaluated. That's not exactly what is going on here, but 
I imagine one could argue some equivalence.

However, the example can also be written as:

       struct X { int a, b; };
       X x { 50, 100 };
       X *o = (X*) &x.b;

       int a_is_b = o->a; // This is UB (or so we say)?

and then the pointer arithmetic considerations don't seem to apply.

Thanks again,
Hal

>
> -Krzysztof
>

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory