[LLVMdev] Strange pointer aliasing behaviour

Jeffrey Yasskin jyasskin at google.com
Thu Jun 17 11:27:21 PDT 2010


On Thu, Jun 17, 2010 at 10:56 AM, David Vandevoorde
<daveed at vandevoorde.com> wrote:
>
> On Jun 17, 2010, at 1:34 PM, Eugene Toder wrote:
>
>>> Do you have a reference to the standard that makes it undefined?
>>
>> I'm second this question. I tried to find anything banning calculating
>> address of one field from address of another in the standard some time
>> ago, but could not find it.
>
>
> In the currect C++0x FCD, 5.7/5:
>
>        "When an expression that has integral type is added to or subtracted
>         from a pointer, the result has the type of the pointer operand.
>         If the pointer operand points to an element of an array object, and
>         the array is large enough, the result points to an element offset
>         from the original element such that the difference of the subscripts
>         of the resulting and original array elements equals the integral
>         expression. In other words, if the expression P points to the i-th
>         element of an array object, the expressions (P)+N (equivalently,
>         N+(P)) and (P)-N (where N has the value n) point to, respectively,
>         the i + n-th and i - n-th elements of the array object, provided
>         they exist. Moreover, if the expression P points to the last element
>         of an array object, the expression (P)+1 points one past the last
>         element of the array object, and if the expression Q points one past
>         the last element of an array object, the expression (Q)-1 points to
>         the last element of the array object. If both the pointer operand
>         and the result point to elements of the same array object, or one
>         past the last element of the array object, the evaluation shall not
>         produce an overflow; otherwise, the behavior is undefined."
>
> (Note in particular the last phrase, and recall that subscripting is defined in terms of pointer arithmetic.)
>

So the proper way to compute the header's address is

_Rep* _M_rep(char* data) {
  return reinterpret_cast<_Rep*>(intptr_t(data) - sizeof(_Rep));
}

as opposed to the

_Rep* _M_rep(char* data) {
  return &(reinterpret_cast<_Rep*>(data)[-1]);
}

that gcc-4.2 through gcc-4.4 use (I didn't check others). Although,
since the _Rep object is not an element of an array, and '(_Rep*)data'
points one-past-the-end, a -1 subscript actually looks legal.

But even if it's illegal, this only affects the "inbounds" qualifier
on the GEP, not the aliasing rules, which the original question was
about. I think the version that adjusts the pointer through intptr_t
obeys all the aliasing rules.



More information about the llvm-dev mailing list