[PATCH] Fix for bug 21725: wrong results with union and strict-aliasing

Jeroen Dobbelaere jeroen.dobbelaere at gmail.com
Fri Mar 20 09:06:23 PDT 2015


In the next set of small functions, the goal is to identify the (for llvm)
wanted behavior related to aliasing.
(having a 32bit architecture in mind)

For every function, I have three fields:
- c++11 : my interpretation of the c++11/14 standard based on a previous
mail from Richard Smith
     (This is: a combination of the known layout of a structure/union/array
and the types)
     (the relevant section for the types is 3.10 10)
- llvm:   what does llvm deduce today
- future: what I would like to see in future

The question to ask is: when should it be valid to REORDER the stores,  in
the assumption
that after the function call, the second object is accessed (read from) in
a legal way.
- NoAlias:   we are allowed to reorder the stores
- MayAlias:  we are _not_ allowed to reorder the stores

For the 'future' field, my personal guideline is that:
- if a union (sub)member is accessed (full path) and the union (or one of
its members) contains a type that matches
  the other access (taking into account the access path), then aliasing
must be assumed (possibly taken into account
  the offset/size of the access)

NOTE:  when the standard requires a MayAlias, and we provide a NoAlias, we
a have 'wrong code' issue
       when the standard specifies a NoAlias, and we provide a MayAlias, we
have a possible performance degradation.
NOTE2: for member array accesses of the same type, llvm will today always
assume them as aliasing,
       even if the happen at different offsets. This is acceptable as it
does not result in wrong code.
NOTE3: depending on the interpretation of the standard, more or less cases
will have the 'MayAlias' label. I try
       to come with a guess on what the standard means, based on a previous
mail from Richard Smith.
       (Richard, please comment for those cases where I made a wrong guess
;) )
NOTE4: it would be good if the standard can be extended with a set of
examples, explicitly stating where reorderings
        are allowed and where not.
NOTE5: we probably want to have a similar behavior as 'gcc', and we assume
that gcc follows the 'future' rules, but
       I was not able to deduce any useful information with
'-fdump-tree-alias'. Probably I missed something :(
       I would be glad if somebody could extend this with information on
how gcc treats the accesses.
NOTE6: In order to make unions really useful, llvm allows to read a union
member with a different type than the one
       that was used to write it. Once we have a correct deduction of the
aliasing relation, this should also work.


So, please comment if you agree/disagree ;)

Jeroen

--

// --------------------------------------
// plain types
void test_p00a(int* a, int* b)
{
  *a=1;
  *b=2;
  // c++11:   MayAlias
  // llvm:    MayAlias
  // future:  MayAlias
}

void test_p00b(int* a, short* b)
{
  *a=1;
  *b=2;

  // c++11:   NoAlias
  // llvm:    NoAlias
  // future:  NoAlias
}

void test_p00c(int* a, char* b)
{
  *a=1;
  *b=2;

  // c++11:   MayAlias
  // llvm:    MayAlias
  // future:  MayAlias
}


// --------------------------------------
// struct
struct S00 {
  char mC_0;
  short mS_1;
  int mI_2;
  int mI_3;
};

void test_s00a(struct S00* a, struct S00* b)
{
  a->mI_2=1;
  b->mC_0=2;

  // c++11:   NoAlias
  // llvm:    NoAlias
  // future:  NoAlias
}

void test_s00b(struct S00* a, struct S00* b)
{
  a->mI_2=1;
  b->mI_3=2;

  // c++11:   NoAlias
  // llvm:    NoAlias
  // future:  NoAlias
}

void test_s00c(struct S00* a, struct S00* b)
{
  a->mI_2=1;
  b->mI_2=2;

  // c++11:   MayAlias
  // llvm:    MayAlias
  // future:  MayAlias
}

void test_s00d(struct S00* a, int* b)
{
  a->mI_2=1;
  *b=2;

  // c++11:   MayAlias
  // llvm:    MayAlias
  // future:  MayAlias
}


void test_s00f(struct S00* a, short* b)
{
  a->mI_2=1;
  *b=2;

  // c++11:   NoAlias
  // llvm:    NoAlias
  // future:  NoAlias
}

void test_s00z(struct S00* a, char* b)
{
  // may alias
  a->mI_2=1;
  *b=2;

  // c++11:   MayAlias
  // llvm:    MayAlias
  // future:  MayAlias
}

// array member
// ------------
struct S03 {
  short mS_0;
  struct S00 mS00_1[4];
  short mS_2;
};


void test_s03b(struct S03* a, struct S03* b)
{
  a->mS00_1[0].mI_2=1;
  b->mS00_1[0].mI_3=2;

  // c++11:   NoAlias
  // llvm:    NoAlias
  // future:  NoAlias
}

void test_s03c(struct S03* a, struct S03* b)
{
  a->mS00_1[0].mI_2=1;
  b->mS00_1[1].mI_2=2;

  // c++11:   NoAlias
  // llvm:    MayAlias ******** performance
  // future:  NoAlias
}

// --------------------------------------
// Unions

// --------------------------------------
// Standard union U00
union U00 {
  short mS;
  int   mI;
};

void test_u00a(union U00* a, union U00* b)
{
  a->mS=1;
  b->mS=2;

  // c++11:   MayAlias
  // llvm:    MayAlias
  // future:  MayAlias
}

void test_u00b(union U00* a, union U00* b)
{
  a->mS=1;
  b->mI=2;

  // c++11:   MayAlias
  // llvm:    NoAlias  *********** wrong code
  // future:  MayAlias
}


void test_u00e(union U00* a, short* b)
{
  a->mS=1;
  *b=2;

  // c++11:   MayAlias
  // llvm:    MayAlias
  // future:  MayAlias
}

void test_u00f(union U00* a, short* b)
{
  a->mI=1;
  *b=2;

  // c++11:   MayAlias (??)
  // llvm:    NoAlias  *********** wrong code
  // future:  MayAlias
}

void test_u00i(union U00* a, long* b)
{
  a->mI=1;
  *b=2;

  // c++11:   NoAlias
  // llvm:    NoAlias
  // future:  NoAlias
}

void test_u00z0(union U00* a, char* b)
{
  a->mI=1;
  *b=2;

  // c++11:   MayAlias
  // llvm:    MayAlias
  // future:  MayAlias
}

// --------------------------------------
// union with array
union U03 {
  short mS[4];
  int mI;
};

void test_u03a(union U03* a, union U03* b)
{
  a->mS[0]=1;
  b->mS[0]=2;

  // c++11:   MayAlias
  // llvm:    MayAlias
  // future:  MayAlias
}

void test_u03b(union U03* a, union U03* b)
{
  a->mS[0]=1;
  b->mI=2;

  // c++11:   MayAlias
  // llvm:    NoAlias  ****** wrong code
  // future:  MayAlias
}

void test_u03d(union U03* a, union U03* b)
{
  a->mS[0]=1;
  b->mS[1]=2;

  // c++11:   NoAlias
  // llvm:    MayAlias   ****** performance
  // future:  NoAlias
}


On Thu, Mar 19, 2015 at 12:49 PM, Jeroen Dobbelaere <
jeroen.dobbelaere at gmail.com> wrote:

>
>
> On Wed, Mar 18, 2015 at 8:02 PM, Daniel Berlin <dberlin at dberlin.org>
> wrote:
>
>>
>>
>> [...]
>> However, another point to consider is that, "C++ standard experts, at
>> least on the GCC side, did not view this as saying "all accesses must have
>> an explicit union access", but that..." may not be a reasonable reading of
>> the standard. The text says, "If a program attempts to access the stored
>> value of an object through a glvalue of other than one
>>
>>> of the following types the behavior is undefined: ...", and so I read
>>> that as saying that the access must be through a glvalue of an aggregate or
>>> union type (with a relevant type as one of its members). So the union or
>>> aggregate type *must* be explicitly present in the access. I don't believe
>>> any other interpretations, especially those which imply non-local effects,
>>> are reasonable.
>>>
>>> Exactly what rule did GCC implement?
>>>
>>
>> If you use an explicit union access, it assumes they alias,  otherwise,
>> all bets are off.
>>
>> Note: As you all know, I have really no dog in this fight, nor am i a
>> language lawyer. Among other things, i'm an aliasing guy. I just do the
>> edge of  what the language lawyers tell me is allowed :)
>>
>>
> I propose that we implement the same behavior in llvm.
> I'm quite busy right now, but I'll try to come up with some examples later
> today for which we can then annotate what kind of behavior the standard
> would allow (using Richards interpretation), what gcc is doing, what we
> want to do.
>
> Greetings,
>
> Jeroen Dobbelaere
>



-- 
Jeroen Dobbelaere
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150320/3b640990/attachment.html>


More information about the cfe-commits mailing list