[LLVMdev] Strange pointer aliasing behaviour
Jeffrey Yasskin
jyasskin at google.com
Thu Jun 17 11:27:21 PDT 2010
On Thu, Jun 17, 2010 at 10:56 AM, David Vandevoorde
<daveed at vandevoorde.com> wrote:
>
> On Jun 17, 2010, at 1:34 PM, Eugene Toder wrote:
>
>>> Do you have a reference to the standard that makes it undefined?
>>
>> I'm second this question. I tried to find anything banning calculating
>> address of one field from address of another in the standard some time
>> ago, but could not find it.
>
>
> In the currect C++0x FCD, 5.7/5:
>
> "When an expression that has integral type is added to or subtracted
> from a pointer, the result has the type of the pointer operand.
> If the pointer operand points to an element of an array object, and
> the array is large enough, the result points to an element offset
> from the original element such that the difference of the subscripts
> of the resulting and original array elements equals the integral
> expression. In other words, if the expression P points to the i-th
> element of an array object, the expressions (P)+N (equivalently,
> N+(P)) and (P)-N (where N has the value n) point to, respectively,
> the i + n-th and i - n-th elements of the array object, provided
> they exist. Moreover, if the expression P points to the last element
> of an array object, the expression (P)+1 points one past the last
> element of the array object, and if the expression Q points one past
> the last element of an array object, the expression (Q)-1 points to
> the last element of the array object. If both the pointer operand
> and the result point to elements of the same array object, or one
> past the last element of the array object, the evaluation shall not
> produce an overflow; otherwise, the behavior is undefined."
>
> (Note in particular the last phrase, and recall that subscripting is defined in terms of pointer arithmetic.)
>
So the proper way to compute the header's address is
_Rep* _M_rep(char* data) {
return reinterpret_cast<_Rep*>(intptr_t(data) - sizeof(_Rep));
}
as opposed to the
_Rep* _M_rep(char* data) {
return &(reinterpret_cast<_Rep*>(data)[-1]);
}
that gcc-4.2 through gcc-4.4 use (I didn't check others). Although,
since the _Rep object is not an element of an array, and '(_Rep*)data'
points one-past-the-end, a -1 subscript actually looks legal.
But even if it's illegal, this only affects the "inbounds" qualifier
on the GEP, not the aliasing rules, which the original question was
about. I think the version that adjusts the pointer through intptr_t
obeys all the aliasing rules.
More information about the llvm-dev
mailing list