[PATCH] Fix for bug 21725: wrong results with union and strict-aliasing

Daniel Berlin dberlin at dberlin.org
Tue Mar 17 17:42:49 PDT 2015


On Tue, Mar 17, 2015 at 3:55 PM, Daniel Berlin <dberlin at dberlin.org> wrote:

>
>
> On Tue, Mar 17, 2015 at 3:32 PM, Jeroen Dobbelaere <
> jeroen.dobbelaere at gmail.com> wrote:
>
>>
>>
>> On Tue, Mar 17, 2015 at 11:15 PM, Daniel Berlin <dberlin at dberlin.org>
>> wrote:
>>
>>>
>>> [..]
>>> I don't understand what this means. How should they do that?
>>>
>>> So, his point was the arrays have no connections to the union. This is
>>> not the only case this occurs in.
>>>
>>> Let's take the canonical example:
>>>
>>>
>>> For example
>>>
>>> union foo {
>>> int a;
>>> float b;
>>> };
>>>
>>> int ihateunions(int *a, float *b)
>>> {
>>> <do a thing with a and b>
>>> }
>>>
>>> int passtounion()
>>> {
>>> union foo bar;
>>> ihateunions(&bar.a, &bar.b);
>>>
>>> }
>>>
>>> Inside ihateunions, you will have no idea that a and b are connected to
>>> unions.
>>>
>>> Let's say this example is easy, i mean, they are both at offset 0, so
>>> they can alias, right?
>>>
>>>
>> My understanding is that if you access members of a union in this way,
>> the compiler is allowed
>> to assume that a and b do not alias.
>>
>
>
> In theory, the last time i remember, you weren't allow to set one member
> of a union and read another.
> But uh, that's not real user code :)
>
> (and IIRC, it does not say anything real)
>
>
>>
>> If you access a member (or nested member) of a union, starting from the
>> union itself, then it depends if the other type is also accessible through
>> the union.
>>
>>
>> So:
>>
>> int foo(union foo* a, float* b, int* c) {
>>   a->a=1;
>>   *b=2;
>>   // compiler must assume a->a and *b can alias
>>   // compiler must not assume *b and *c alias (access not through union)
>> }
>>
>> (Also see section 3.10 of the c++03 standard;
>>
>
>
> This, IMHO, does not say what you seem to think it does :)
>
> For C++03,  3.10 only includes the word "union" here: "If a program
> attempts to access the stored value of an object through an lvalue of other
> than one of the following types the behavior is undefined:
>
> — the dynamic type of the object,
> — a cv-qualified version of the dynamic type of the object,
> — a type that is the signed or unsigned type corresponding to the dynamic
> type of the object,
>  — a type that is the signed or unsigned type corresponding to a
> cv-qualified version of the dynamic type of the object,
> — an aggregate or union type that includes one of the aforementioned types
> among its members (including, recursively, a member of a subaggregate or
> contained union),
>  — a type that is a (possibly cv-qualified) base class type of the dynamic
> type of the object,
>  — a char or unsigned char type."
>
>
> C++ standard experts, at least on the GCC side, did not view this as
> saying "all accesses must have an explicit union access", but that "It must
> be part of a union type", but about whether you try to access it through a
> union that doesn't have the right actual types in it.
>
> The type of those objects is right the type of the object. There is, IMHO,
>  nothing illegal about those accesses.
>
>
BTW, the example I gave is trivially transformable into one on structs:


struct foo {
int a;
float b;
};

int ihatestructs(int *a, float *b)
{
<do a thing with a and b>
}

int passtostruct()
{
struct foo bar;
ihatestructs(&bar.a, &bar.b);
}

Given 3.10 says literally nothing different about accesses through unions
and accesses through aggregates (they are even in the same sentence), i
don't see how you can reason your way to say that unions give a different
result than strucfts in the above case.

That is, strictly on a TBAA basis, i don't see how you believe the standard
allows different answers.
(Practically, i know why we are okay with them :P)

Now, TBAA combined with other info, i can see.  This is because in the
above example, with TBAA you can say "TBAA says *a and *b in ihatestructs
can't alias unless they were accessed through an structure that contained
both". So far so good (though note, this alone gives you *nothing* about
the above unless you are guaranteed to  have the whole program in front of
you).  Struct rules tell us that a and b must not be at the same offset,
and memory layout rules tell us that objects at different offsets can not
alias.  So, combined, we have "TBAA says *a and *b can't alias unless they
are a structure, and if they were in a structure, they can't alias because
they aren't at the same offset".  These two things combined tell you *a and
*b can't alias.
But either one alone does not.

This is the problems with unions. You don't get the second part easily,
because they may in fact, be at the same offset.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150317/d5320a5d/attachment.html>


More information about the cfe-commits mailing list