[llvm-dev] RFC: Representing unions in TBAA

Daniel Berlin via llvm-dev llvm-dev at lists.llvm.org
Mon Feb 13 19:39:03 PST 2017

On Mon, Feb 13, 2017 at 10:07 AM, Hubert Tong <
hubert.reinterpretcast at gmail.com> wrote:

> On Mon, Feb 13, 2017 at 2:23 AM, Daniel Berlin via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>>> I don't think this fully solves the problem -- you'll also need to fix
>>> getMostGenericTBAA.  That is, even if you implement the above scheme,
>>> say you started out with:
>>> union U {
>>>   int i;
>>>   float f;
>>> };
>>> float f(union U *u, int *ii, float *ff, bool c) {
>>>   if (c) {
>>>     *ii = 10;
>>>     *ff = 10.0;
>>>   } else {
>>>     u->i = 10;    // S0
>>>     u->f = 10.0;  // S1
>>>   }
>>>   return u->f;
>>> }
>>> (I presume you're trying to avoid reordering S0 and S1?)
>>> SimplifyCFG or some other such pass may transform f to:
>>> float f(union U *u, int *ii, float *ff, bool c) {
>>>   int *iptr = c ? ii : &(u->i);
>>>   int *fptr = c ? ff : &(u->f);
>>>   *iptr = 10;     // S2
>>>   *fptr = 10.0;   // S3
>>>   return u->f;
>>> }
>>> then getMostGenericTBAA will infer scalar "int" TBAA for S2 and scalar
>>> "float" TBAA for S3, which will be NoAlias and allow the reordering
>>> you were trying to avoid.
>> FWIW, i have to read this in detail, but a few things pop out at me.
>> 1. We would like to live in a world where we don't depend on TBAA
>> overriding BasicAA to get correct answers.  We do now, but don't want to.
>> Hopefully this proposal does not make that impossible.
>> 2.  Literally the only way that GCC ends up getting this right is two
>> fold:
>> It only guarantees things about direct access through union.
>> If you take the address of the union member (like the transform above),
>> it knows it will get a wrong answer.
>> So what it does is it finds the type it has to stop at (here, the union)
>> to keep the TBAA set the same, and makes the transform end there.
>> So the above would not occur.
>> 3. A suggestion that TBAA follow all possible paths seems .. very slow.
>> 4. "The main motivation for this is functional correctness of code using
>> unions".  I believe you mean "with tbaa and strict-aliasing on".
>> If not,functional correctness for unions should not be in any way related
>> to requiring TBAA.
>> 5. Unions are among the worst area of the standard in terms of "nobody
>> has really thought super-hard about the interaction of aliasing and unions
>> in a way that is coherent".
>> So when you say things like 'necessary for functional correctness of
>> unions', just note that this is pretty much wrong.  You probably mean
>> "necessary for a reasonable interpretation" or something.
>> Because we would be *functionally correct* by the standard by destroying
>> the program  if you ever read the member you didn't set :)
> C11 subclause paragraph 3, has in footnote 95:
> If the member used to read the contents of a union object is not the same
> as the member last used to store a value in the object, the appropriate
> part of the object representation of the value is reinterpreted as an
> object representation in the new type as described in 6.2.6 (a process
> sometimes called "type punning"). This might be a trap representation.
> So, the intent is at least that the use of the . operator or the ->
> operator to access a member of a union would "safely" perform type punning.
Certainly, if you can quote this, you know this is new to C11 (and newer
versions of C++).

It was explicitly *not* true in earlier versions.

They've also slowly cleaned up the aliasing rules, but, honestly, still a
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170213/053d6d23/attachment.html>

More information about the llvm-dev mailing list