[cfe-dev] [PATCH] Fix for bug 21725: wrong results with union and strict-aliasing

Daniel Berlin dberlin at dberlin.org
Wed Mar 18 12:03:26 PDT 2015

On Wed, Mar 18, 2015 at 11:43 AM, Richard Smith <richard at metafoo.co.uk>

> On Wed, Mar 18, 2015 at 2:22 AM, Jeroen Dobbelaere <
> jeroen.dobbelaere at gmail.com> wrote:
>> On Tue, Mar 17, 2015 at 11:55 PM, Daniel Berlin <dberlin at dberlin.org>
>> wrote:
>>> [..]
>>> In theory, the last time i remember, you weren't allow to set one member
>>> of a union and read another.
>>> But uh, that's not real user code :)
>>> (and IIRC, it does not say anything real)
>> This is indeed correct. The main issue is that llvm thinks it is allowed
>> to reorder 'stores' (as it thinks they are not aliasing),
>> so that a legal read afterwards will result in a wrong answer (if an
>> optimization or the backend reorders the stores based on the wrong aliasing
>> information).
>>>> If you access a member (or nested member) of a union, starting from the
>>>> union itself, then it depends if the other type is also accessible through
>>>> the union.
>>>> So:
>>>> int foo(union foo* a, float* b, int* c) {
>>>>   a->a=1;
>>>>   *b=2;
>>>>   // compiler must assume a->a and *b can alias
>>>>   // compiler must not assume *b and *c alias (access not through union)
>>>> }
>>>> (Also see section 3.10 of the c++03 standard;
>>> This, IMHO, does not say what you seem to think it does :)
>>> For C++03,  3.10 only includes the word "union" here: "If a program
>>> attempts to access the stored value of an object through an lvalue of other
>>> than one of the following types the behavior is undefined:
>>> — the dynamic type of the object,
>>> — a cv-qualified version of the dynamic type of the object,
>>> — a type that is the signed or unsigned type corresponding to the
>>> dynamic type of the object,
>>>  — a type that is the signed or unsigned type corresponding to a
>>> cv-qualified version of the dynamic type of the object,
>>> — an aggregate or union type that includes one of the aforementioned
>>> types among its members (including, recursively, a member of a subaggregate
>>> or contained union),
>>>  — a type that is a (possibly cv-qualified) base class type of the
>>> dynamic type of the object,
>>>  — a char or unsigned char type."
>>> C++ standard experts, at least on the GCC side, did not view this as
>>> saying "all accesses must have an explicit union access", but that "It must
>>> be part of a union type", but about whether you try to access it through a
>>> union that doesn't have the right actual types in it.
>>> The type of those objects is right the type of the object. There is,
>>> IMHO,  nothing illegal about those accesses.
>> So, with that interpretation, the mere presence of a 'union { short s;
>> int i; }' (in the same or a different compilation unit) should influence
>> the (type based) aliasing of all short and int pointers ?
> That's not what this wording literally says (nor, I think, what it
> intends). The quoted text is specifically talking about the type of the
> lvalue used to perform the access. In C++, this is almost never a struct or
> union type; the only case where that can happen is when copying the object
> representation of a union in a case like this:
>   union A { int i; short s; };
>   A a;
>   A f() { return a; } // here, we copy the representation of 'a' through
> an lvalue of type 'A'.
> (In C, where largely the same wording appears, this bullet is relevant in
> more cases, since copies of structs in C are performed directly by copying
> the object and not via a copy constructor.)
> In a case like 'ihateunions', the expression '*a = 0;' would use an lvalue
> of type 'int' and the expression 'x = *b;' would use an lvalue of type
> 'float', so those two accesses are known to not access the same object by
> 3.10, irrespective of what aggregate or union types exist elsewhere in the
> program. Also note that C++ does not specify any way to change the active
> member of a union other than via a placement new-expression (and even that
> mechanism is only implied to work, never normatively stated) so these
> assignments presumably do not change the active member, and thus at least
> one of them must have undefined behavior (unlike C, C++ does not imbue
> storage with an effective type that can change through such an assignment;
> instead, it has objects whose lifetimes are started through declarations,
> when temporaries are created, and through new-expressions). As a sane
> implementation, we absolutely should guarantee that accessing a union
> member of primitive type and then immediately assigning to the result of
> that member access expression does change the active member, but we are not
> required to guarantee the same happens when the access is laundered through
> a function, according to the current C++ standard's wording, and attempting
> to provide such a guarantee seems largely pointless.

FWIW: I'm perfectly fine with this, as it lets me optimize more.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150318/29ebdd7b/attachment.html>

More information about the cfe-commits mailing list