[PATCH] Fix for bug 21725: wrong results with union and strict-aliasing

Daniel Berlin dberlin at dberlin.org
Wed Mar 18 12:02:39 PDT 2015


On Wed, Mar 18, 2015 at 9:04 AM, Hal Finkel <hfinkel at anl.gov> wrote:

>
>
> ------------------------------
>
> *From: *"Daniel Berlin" <dberlin at dberlin.org>
> *To: *"Jeroen Dobbelaere" <jeroen.dobbelaere at gmail.com>
> *Cc: *"Hal Finkel" <hfinkel at anl.gov>,
> reviews+D8056+public+7b63cedb34d51bb1 at reviews.llvm.org,
> cfe-commits at cs.uiuc.edu, "cfe-dev at cs.uiuc.edu Developers" <
> cfe-dev at cs.uiuc.edu>
> *Sent: *Wednesday, March 18, 2015 10:27:28 AM
> *Subject: *Re: [PATCH] Fix for bug 21725: wrong results with union and
> strict-aliasing
>
>
>
> On Wed, Mar 18, 2015 at 2:22 AM, Jeroen Dobbelaere <
> jeroen.dobbelaere at gmail.com> wrote:
>
>>
>>
>> On Tue, Mar 17, 2015 at 11:55 PM, Daniel Berlin <dberlin at dberlin.org>
>> wrote:
>>
>>> [..]
>>>
>>> In theory, the last time i remember, you weren't allow to set one member
>>> of a union and read another.
>>> But uh, that's not real user code :)
>>>
>>> (and IIRC, it does not say anything real)
>>>
>>>
>> This is indeed correct. The main issue is that llvm thinks it is allowed
>> to reorder 'stores' (as it thinks they are not aliasing),
>> so that a legal read afterwards will result in a wrong answer (if an
>> optimization or the backend reorders the stores based on the wrong aliasing
>> information).
>>
>
> I thought both LLVM and GCC guaranteed type punning through unions would
> work?
>
>
> Yes, LLVM does this (at least for "local" type punning, for some heuristic
> definition of local), because we always allow BasicAA to override TBAA when
> BasicAA is *sure* there is aliasing.
>
>
>
>>
>>
>>
>>>> If you access a member (or nested member) of a union, starting from the
>>>> union itself, then it depends if the other type is also accessible through
>>>> the union.
>>>>
>>>>
>>>> So:
>>>>
>>>> int foo(union foo* a, float* b, int* c) {
>>>>   a->a=1;
>>>>   *b=2;
>>>>   // compiler must assume a->a and *b can alias
>>>>   // compiler must not assume *b and *c alias (access not through union)
>>>> }
>>>>
>>>> (Also see section 3.10 of the c++03 standard;
>>>>
>>>
>>>
>>> This, IMHO, does not say what you seem to think it does :)
>>>
>>> For C++03,  3.10 only includes the word "union" here: "If a program
>>> attempts to access the stored value of an object through an lvalue of other
>>> than one of the following types the behavior is undefined:
>>>
>>> — the dynamic type of the object,
>>> — a cv-qualified version of the dynamic type of the object,
>>> — a type that is the signed or unsigned type corresponding to the
>>> dynamic type of the object,
>>>  — a type that is the signed or unsigned type corresponding to a
>>> cv-qualified version of the dynamic type of the object,
>>> — an aggregate or union type that includes one of the aforementioned
>>> types among its members (including, recursively, a member of a subaggregate
>>> or contained union),
>>>  — a type that is a (possibly cv-qualified) base class type of the
>>> dynamic type of the object,
>>>  — a char or unsigned char type."
>>>
>>>
>>> C++ standard experts, at least on the GCC side, did not view this as
>>> saying "all accesses must have an explicit union access", but that "It must
>>> be part of a union type", but about whether you try to access it through a
>>> union that doesn't have the right actual types in it.
>>>
>>> The type of those objects is right the type of the object. There is,
>>> IMHO,  nothing illegal about those accesses.
>>>
>>>
>>>
>> So, with that interpretation, the mere presence of a 'union { short s;
>> int i; }' (in the same or a different compilation unit) should influence
>> the (type based) aliasing of all short and int pointers ?
>>
>
> It's actually worse than this interpretation, since someone pointed out on
> the gcc side that "ihateunions" could be in a different TU, and you'd never
> see the union at all.
> While I think this is the right interpretation (as do others), that's not
> the point.
> I'm actually on your side here :)
>
> But remember that Hal said  that if hte language has semantics, we should
> follow them.
>
>
> I did. It is also true that standards have defects ;) If a strict reading
> of the language semantics implies a 'spooky action at a distance' effect,
> then we'll need to draw a line in the sand somewhere to avoid that. I also
> think your interpretation that implies that is incorrect...
>
>
> The whole point of this is essentially to point out that the language
> semantics weren't completely thought out here :)
> Because of that, our options are either "make up a rule people can follow"
> or "disable TBAA".
>
> GCC, and every other compiler i'm aware of, has chosen the former, at
> least for C++03 (I don't know what C++11 or 14 say here).
>
>
> Yes; fair enough. What I want is that if we make up a rule, then we make
> that rule well defined and documented (not just whatever the implementation
> happens to do today); and we should also write a paper for the C++
> Standards Committee proposing that as an official solution (I'll be happy
> to present at the next meeting, and we have approximately 1 month before
> the next paper deadline, IIRC).
>
> In the current draft, I relevant wording looks equivalent:
>
> — an aggregate or union type that includes one of the aforementioned types
> among its elements
>     or non-static data members (including, recursively, an element or
> non-static data member of
>     a subaggregate or contained union),
>
>
> However, another point to consider is that, "C++ standard experts, at
> least on the GCC side, did not view this as saying "all accesses must have
> an explicit union access", but that..." may not be a reasonable reading of
> the standard. The text says, "If a program attempts to access the stored
> value of an object through a glvalue of other than one
> of the following types the behavior is undefined: ...", and so I read that
> as saying that the access must be through a glvalue of an aggregate or
> union type (with a relevant type as one of its members). So the union or
> aggregate type *must* be explicitly present in the access. I don't believe
> any other interpretations, especially those which imply non-local effects,
> are reasonable.
>
> Exactly what rule did GCC implement?
>

If you use an explicit union access, it assumes they alias,  otherwise, all
bets are off.

Note: As you all know, I have really no dog in this fight, nor am i a
language lawyer. Among other things, i'm an aliasing guy. I just do the
edge of  what the language lawyers tell me is allowed :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150318/42d465f9/attachment.html>


More information about the cfe-commits mailing list