[cfe-dev] [PATCH] Fix for bug 21725: wrong results with union and strict-aliasing

Richard Smith richard at metafoo.co.uk
Wed Mar 18 11:43:52 PDT 2015


On Wed, Mar 18, 2015 at 2:22 AM, Jeroen Dobbelaere <
jeroen.dobbelaere at gmail.com> wrote:

>
>
> On Tue, Mar 17, 2015 at 11:55 PM, Daniel Berlin <dberlin at dberlin.org>
> wrote:
>
>> [..]
>>
>> In theory, the last time i remember, you weren't allow to set one member
>> of a union and read another.
>> But uh, that's not real user code :)
>>
>> (and IIRC, it does not say anything real)
>>
>>
> This is indeed correct. The main issue is that llvm thinks it is allowed
> to reorder 'stores' (as it thinks they are not aliasing),
> so that a legal read afterwards will result in a wrong answer (if an
> optimization or the backend reorders the stores based on the wrong aliasing
> information).
>
>
>
>>> If you access a member (or nested member) of a union, starting from the
>>> union itself, then it depends if the other type is also accessible through
>>> the union.
>>>
>>>
>>> So:
>>>
>>> int foo(union foo* a, float* b, int* c) {
>>>   a->a=1;
>>>   *b=2;
>>>   // compiler must assume a->a and *b can alias
>>>   // compiler must not assume *b and *c alias (access not through union)
>>> }
>>>
>>> (Also see section 3.10 of the c++03 standard;
>>>
>>
>>
>> This, IMHO, does not say what you seem to think it does :)
>>
>> For C++03,  3.10 only includes the word "union" here: "If a program
>> attempts to access the stored value of an object through an lvalue of other
>> than one of the following types the behavior is undefined:
>>
>> — the dynamic type of the object,
>> — a cv-qualified version of the dynamic type of the object,
>> — a type that is the signed or unsigned type corresponding to the dynamic
>> type of the object,
>>  — a type that is the signed or unsigned type corresponding to a
>> cv-qualified version of the dynamic type of the object,
>> — an aggregate or union type that includes one of the aforementioned
>> types among its members (including, recursively, a member of a subaggregate
>> or contained union),
>>  — a type that is a (possibly cv-qualified) base class type of the
>> dynamic type of the object,
>>  — a char or unsigned char type."
>>
>>
>> C++ standard experts, at least on the GCC side, did not view this as
>> saying "all accesses must have an explicit union access", but that "It must
>> be part of a union type", but about whether you try to access it through a
>> union that doesn't have the right actual types in it.
>>
>> The type of those objects is right the type of the object. There is,
>> IMHO,  nothing illegal about those accesses.
>>
>>
>>
> So, with that interpretation, the mere presence of a 'union { short s; int
> i; }' (in the same or a different compilation unit) should influence the
> (type based) aliasing of all short and int pointers ?
>

That's not what this wording literally says (nor, I think, what it
intends). The quoted text is specifically talking about the type of the
lvalue used to perform the access. In C++, this is almost never a struct or
union type; the only case where that can happen is when copying the object
representation of a union in a case like this:

  union A { int i; short s; };
  A a;
  A f() { return a; } // here, we copy the representation of 'a' through an
lvalue of type 'A'.

(In C, where largely the same wording appears, this bullet is relevant in
more cases, since copies of structs in C are performed directly by copying
the object and not via a copy constructor.)

In a case like 'ihateunions', the expression '*a = 0;' would use an lvalue
of type 'int' and the expression 'x = *b;' would use an lvalue of type
'float', so those two accesses are known to not access the same object by
3.10, irrespective of what aggregate or union types exist elsewhere in the
program. Also note that C++ does not specify any way to change the active
member of a union other than via a placement new-expression (and even that
mechanism is only implied to work, never normatively stated) so these
assignments presumably do not change the active member, and thus at least
one of them must have undefined behavior (unlike C, C++ does not imbue
storage with an effective type that can change through such an assignment;
instead, it has objects whose lifetimes are started through declarations,
when temporaries are created, and through new-expressions). As a sane
implementation, we absolutely should guarantee that accessing a union
member of primitive type and then immediately assigning to the result of
that member access expression does change the active member, but we are not
required to guarantee the same happens when the access is laundered through
a function, according to the current C++ standard's wording, and attempting
to provide such a guarantee seems largely pointless.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150318/f1e08766/attachment.html>


More information about the cfe-dev mailing list