[LLVMdev] [cfe-dev] For alias analysis, It's gcc too aggressive or LLVM need to improve?

Kevin Qin kevinqindev at gmail.com
Mon Aug 11 23:44:15 PDT 2014


Hi all,

Thanks for you paying time to look at this issue. I'm not an expert for
C/C++ language, so I can just post some experiment results from LLVM and
GCC.

If we make minor changes to the test, gcc may give different results.

#include <stdio.h>
struct heap {int index; int b;};
struct heap **ptr;
int aa;

int main() {
  struct heap element;
  struct heap *array[2];
  array[0] = (struct heap *)&aa;
  array[1] = &element;
  ptr = array;
  aa = 1;
  int i;
  for (i =0; i< 2; i++) {
    printf("i is %d, aa is %d\n", i, aa);
    array[i]->index = 0;  // we replace ptr to array here. so no global
lvalue is used.
  }
  return 0;
}

Result didn't get changed,

$gcc test.c -O0
$./a.out
i is 0, aa is 1
i is 1, aa is 0

$gcc test.c -O2
$./a.out
i is 0, aa is 1
i is 1, aa is 1

But if we change a bit more, like

#include <stdio.h>
struct heap {int index; int b;};
struct heap **ptr;
int aa;

int main() {
  struct heap element;
  struct heap *array[2];
  array[0] = (struct heap *)&aa;
  array[1] = &element;
  //ptr = array; // remove this assignment as well.
  aa = 1;
  int i;
  for (i =0; i< 2; i++) {
    printf("i is %d, aa is %d\n", i, aa);
    array[i]->index = 0;  // we replace ptr to array here. so no global
lvalue is used.
  }
  return 0;
}

Result changed to be the same as LLVM.

$gcc test.c -O0
$./a.out
i is 0, aa is 1
i is 1, aa is 0

$gcc test.c -O2
$./a.out
i is 0, aa is 1
i is 1, aa is 0

I don't know why a assignement statment to a unrelated global pointer will
affect gcc's aliasing work, and I don't know from the language point of
view, if we use a local pointer to replace the global pointer, then the
result would be still undefined or not.

Regards,
Kevin


2014-08-12 12:02 GMT+08:00 Daniel Berlin <dberlin at dberlin.org>:

> So then there you go, a real language lawyer says it's invalid for
> other reasons :)
>
>
> On Mon, Aug 11, 2014 at 7:31 PM, Richard Smith <richard at metafoo.co.uk>
> wrote:
> > I'll take this from the C++ angle; the C rules are not the same, and I'm
> not
> > confident they give the same answer.
> >
> > On Mon, Aug 11, 2014 at 2:09 PM, Daniel Berlin <dberlin at dberlin.org>
> wrote:
> >>
> >> The access path matters (in some sense), but this is, AFIAK, valid no
> >> matter how you look at it.
> >>
> >> Let's take a look line by line
> >>
> >> #include <stdio.h>
> >> struct heap {int index; int b;};
> >> struct heap **ptr;
> >> int aa;
> >>
> >> int main() {
> >>   struct heap element;
> >>   struct heap *array[2];
> >>   array[0] = (struct heap *)&aa; <- Okay so far.
> >>   array[1] = &element; <- Clearly okay
> >>   ptr = array; <- still okay so far
> >>   aa = 1; <- not pointer related.
> >>   int i; <- not pointer related
> >>   for (i =0; i< 2; i++) { <- not pointer related
> >>     printf("i is %d, aa is %d\n", i, aa); <- not pointer related
> >>     ptr[i]->index = 0; <- Here is where it gets wonky.
> >>
> >> <rest of codeis irrelevan>
> >>
> >> First, ptr[i] is an lvalue, of type struct heap *, and ptr[i]-> is an
> >> lvalue of type struct heap (in C++03, this is 5.2.5 paragraph 3, check
> >> footnote 59).
> >
> >
> > This is where we get the undefined behavior.
> >
> > 3.8/6: "[If you have a glvalue referring to storage but where there is no
> > corresponding object, the] program has undefined behavior if:
> > [...] the glvalue is used to access a non-static data member".
> >
> > There is no object of type 'heap' denoted by *ptr[0] (by 1.8/1, we can
> only
> > create objects through definitions, new-expressions, and by creating
> > temporary objects). So the behavior is undefined when we evaluate
> > ptr[0]->index.
> >
> >> (I'm too lazy to parse the rules for whether E1.E2 is an lvalue,
> >> because it doesn't end up making a difference)
> >>
> >> Let's assume, for the sake of argument,  the actual access to aa
> >> occurs through an lvalue of type "struct heap" rather than "int"
> >> In C++03 and C++11, it says:
> >>
> >> An object shall have its stored value accessed only by an lvalue
> >> expression that has one of the following types:
> >>
> >> a type compatible with the effective type of the object,
> >> a qualified version of a type compatible with the effective type of the
> >> object,
> >> a type that is the signed or unsigned type corresponding to the
> >> effective type of the object,
> >> a type that is the signed or unsigned type corresponding to a
> >> qualified version of the effective type of the object,
> >> an aggregate or union type that includes one of the aforementioned
> >> types among its members (including, recursively, a member of a
> >> subaggregate or contained union), or
> >> a character type.
> >> (C++11 adds something about dynamic type here)
> >>
> >> struct heap is "an aggregate or union type that includes one of the
> >> aforementioned types among it's members".
> >>
> >> Thus, this is legal to access this int through an lvalue expression
> >> that has a type of struct heap.
> >> Whether the actual store is legal for other reasons, i don't know.
> >> There are all kinds of rules about object alignment and value
> >> representation that aren't my baliwick.  I leave it to another
> >> language lawyer to say whether it's okay to do a store to something
> >> that is essentially, a partial object.
> >>
> >> Note that GCC actually knows this is legal to alias, at least at the
> >> tree level. I debugged it there, and it definitely isn't eliminating
> >> it at a high level. It also completely understands the call to ->index
> >> = 0 affects "aa", and has a reload for aa before the printf call.
> >>
> >> I don't know what is eliminating this at the RTL level, but i can't
> >> see why it's illegal from *aliasing rules*. Maybe this is invalid for
> >> some other reason.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>   }
> >>   return 0;
> >> }
> >>
> >> ptr[i]->index = 0;
> >>
> >> On Mon, Aug 11, 2014 at 1:13 PM, Reid Kleckner <rnk at google.com> wrote:
> >> > +aliasing people
> >> >
> >> > I *think* this is valid, because the rules have always been described
> to
> >> > me
> >> > in terms of underlying storage type, and not access path. These are
> all
> >> > ints, so all loads and stores can alias.
> >> >
> >> >
> >> > On Sat, Aug 9, 2014 at 3:07 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> >> >>
> >> >> ----- Original Message -----
> >> >> > From: "Tim Northover" <t.p.northover at gmail.com>
> >> >> > To: "Jonas Wagner" <jonas.wagner at epfl.ch>
> >> >> > Cc: "cfe-dev Developers" <cfe-dev at cs.uiuc.edu>, "LLVM Developers
> >> >> > Mailing
> >> >> > List" <llvmdev at cs.uiuc.edu>
> >> >> > Sent: Friday, August 8, 2014 6:54:50 AM
> >> >> > Subject: Re: [cfe-dev] [LLVMdev] For alias analysis, It's gcc too
> >> >> > aggressive or LLVM need to improve?
> >> >> >
> >> >> > > your C program invokes undefined behavior when it dereferences
> >> >> > > pointers that
> >> >> > > have been converted to other types. See for example
> >> >> > >
> >> >> > >
> >> >> > >
> http://stackoverflow.com/questions/4810417/c-when-is-casting-between-pointer-types-not-undefined-behavior
> >> >> >
> >> >> > I don't think it's quite that simple.The type-based aliasing rules
> >> >> > come from 6.5p7 of C11, I think. That says:
> >> >> >
> >> >> > "An object shall have its stored value accessed only by an lvalue
> >> >> > expression that has one of
> >> >> > the following types:
> >> >> >   + a type compatible with the effective type of the object,
> >> >> >   [...]
> >> >> >   + an aggregate or union type that includes one of the
> >> >> >   aforementioned
> >> >> > types among its members [...]"
> >> >> >
> >> >> > That would seem to allow this usage: aa (effective type "int") is
> >> >> > being accessed via an lvalue "ptr[i]->index" of type "int".
> >> >> >
> >> >> > The second point would even seem to allow something like "ptr[i] =
> >> >> > ..." if aa was declared "int aa[2];", though that seems to be going
> >> >> > too far. It also seems to be very difficult to pin down a meaning
> >> >> > (from the standard) for "a->b" if a is not a pointer to an object
> >> >> > with
> >> >> > the correct effective type. So the entire area is probably one
> that's
> >> >> > open to interpretation.
> >> >> >
> >> >> > I've added cfe-dev to the list; they're the *professional* language
> >> >> > lawyers.
> >> >>
> >> >> Coincidentally, this also seems to be PR20585 (adding Jiangning Liu,
> >> >> the
> >> >> reporter of that bug, to this thread too).
> >> >>
> >> >>  -Hal
> >> >>
> >> >> >
> >> >> > Cheers.
> >> >> >
> >> >> > Tim.
> >> >> > _______________________________________________
> >> >> > cfe-dev mailing list
> >> >> > cfe-dev at cs.uiuc.edu
> >> >> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> >> >> >
> >> >>
> >> >> --
> >> >> Hal Finkel
> >> >> Assistant Computational Scientist
> >> >> Leadership Computing Facility
> >> >> Argonne National Laboratory
> >> >> _______________________________________________
> >> >> LLVM Developers mailing list
> >> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > LLVM Developers mailing list
> >> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >> >
> >
> >
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>



-- 
Best Regards,

Kevin Qin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140812/33f90869/attachment.html>


More information about the llvm-dev mailing list