[LLVMdev] [cfe-dev] For alias analysis, It's gcc too aggressive or LLVM need to improve?

Hal Finkel hfinkel at anl.gov
Tue Aug 19 16:57:35 PDT 2014


----- Original Message -----
> From: "Richard Smith" <richard at metafoo.co.uk>
> To: "Daniel Berlin" <dberlin at dberlin.org>
> Cc: "Reid Kleckner" <rnk at google.com>, "Hal Finkel" <hfinkel at anl.gov>, "Daniel Berlin" <dannyb at google.com>, "LLVM
> Developers Mailing List" <llvmdev at cs.uiuc.edu>, "cfe-dev Developers" <cfe-dev at cs.uiuc.edu>
> Sent: Monday, August 11, 2014 9:31:09 PM
> Subject: Re: [LLVMdev] [cfe-dev] For alias analysis, It's gcc too aggressive or LLVM need to improve?
> 
> 
> 
> 
> I'll take this from the C++ angle; the C rules are not the same, and
> I'm not confident they give the same answer.
> 
> 
> On Mon, Aug 11, 2014 at 2:09 PM, Daniel Berlin < dberlin at dberlin.org
> > wrote:
> 
> 
> The access path matters (in some sense), but this is, AFIAK, valid no
> matter how you look at it.
> 
> Let's take a look line by line
> 
> 
> #include <stdio.h>
> struct heap {int index; int b;};
> struct heap **ptr;
> int aa;
> 
> int main() {
> struct heap element;
> struct heap *array[2];
> array[0] = (struct heap *)&aa; <- Okay so far.
> array[1] = &element; <- Clearly okay
> ptr = array; <- still okay so far
> aa = 1; <- not pointer related.
> int i; <- not pointer related
> for (i =0; i< 2; i++) { <- not pointer related
> printf("i is %d, aa is %d\n", i, aa); <- not pointer related
> ptr[i]->index = 0; <- Here is where it gets wonky.
> 
> <rest of codeis irrelevan>
> 
> First, ptr[i] is an lvalue, of type struct heap *, and ptr[i]-> is an
> lvalue of type struct heap (in C++03, this is 5.2.5 paragraph 3,
> check
> footnote 59).
> 
> 
> 
> This is where we get the undefined behavior.
> 
> 
> 
> 3.8/6: "[If you have a glvalue referring to storage but where there
> is no corresponding object, the] program has undefined behavior if:
> 
> [...] the glvalue is used to access a non-static data member".
> 
> 
> There is no object of type 'heap' denoted by *ptr[0] (by 1.8/1, we
> can only create objects through definitions, new-expressions, and by
> creating temporary objects). So the behavior is undefined when we
> evaluate ptr[0]->index.

Any thoughts on how we might communicate this information to the optimizer? In this case we might be able to infer something from the struct-path aware TBAA (because reaching the int though the structure has a different path length than reaching the top-level int), but I'm not sure that addresses the general case.

Thanks again,
Hal

> 
> (I'm too lazy to parse the rules for whether E1.E2 is an lvalue,
> because it doesn't end up making a difference)
> 
> Let's assume, for the sake of argument, the actual access to aa
> occurs through an lvalue of type "struct heap" rather than "int"
> 
> In C++03 and C++11, it says:
> 
> An object shall have its stored value accessed only by an lvalue
> expression that has one of the following types:
> 
> 
> a type compatible with the effective type of the object,
> a qualified version of a type compatible with the effective type of
> the object,
> a type that is the signed or unsigned type corresponding to the
> 
> effective type of the object,
> a type that is the signed or unsigned type corresponding to a
> qualified version of the effective type of the object,
> 
> an aggregate or union type that includes one of the aforementioned
> types among its members (including, recursively, a member of a
> subaggregate or contained union), or
> a character type.
> (C++11 adds something about dynamic type here)
> 
> struct heap is "an aggregate or union type that includes one of the
> aforementioned types among it's members".
> 
> Thus, this is legal to access this int through an lvalue expression
> that has a type of struct heap.
> Whether the actual store is legal for other reasons, i don't know.
> There are all kinds of rules about object alignment and value
> representation that aren't my baliwick. I leave it to another
> language lawyer to say whether it's okay to do a store to something
> that is essentially, a partial object.
> 
> Note that GCC actually knows this is legal to alias, at least at the
> tree level. I debugged it there, and it definitely isn't eliminating
> it at a high level. It also completely understands the call to
> ->index
> = 0 affects "aa", and has a reload for aa before the printf call.
> 
> I don't know what is eliminating this at the RTL level, but i can't
> see why it's illegal from *aliasing rules*. Maybe this is invalid for
> some other reason.
> 
> 
> 
> 
> 
> 
> 
> }
> return 0;
> 
> }
> 
> ptr[i]->index = 0;
> 
> 
> 
> On Mon, Aug 11, 2014 at 1:13 PM, Reid Kleckner < rnk at google.com >
> wrote:
> > +aliasing people
> > 
> > I *think* this is valid, because the rules have always been
> > described to me
> > in terms of underlying storage type, and not access path. These are
> > all
> > ints, so all loads and stores can alias.
> > 
> > 
> > On Sat, Aug 9, 2014 at 3:07 PM, Hal Finkel < hfinkel at anl.gov >
> > wrote:
> >> 
> >> ----- Original Message -----
> >> > From: "Tim Northover" < t.p.northover at gmail.com >
> >> > To: "Jonas Wagner" < jonas.wagner at epfl.ch >
> >> > Cc: "cfe-dev Developers" < cfe-dev at cs.uiuc.edu >, "LLVM
> >> > Developers Mailing
> >> > List" < llvmdev at cs.uiuc.edu >
> >> > Sent: Friday, August 8, 2014 6:54:50 AM
> >> > Subject: Re: [cfe-dev] [LLVMdev] For alias analysis, It's gcc
> >> > too
> >> > aggressive or LLVM need to improve?
> >> > 
> >> > > your C program invokes undefined behavior when it dereferences
> >> > > pointers that
> >> > > have been converted to other types. See for example
> >> > > 
> >> > > http://stackoverflow.com/questions/4810417/c-when-is-casting-between-pointer-types-not-undefined-behavior
> >> > 
> >> > I don't think it's quite that simple.The type-based aliasing
> >> > rules
> >> > come from 6.5p7 of C11, I think. That says:
> >> > 
> >> > "An object shall have its stored value accessed only by an
> >> > lvalue
> >> > expression that has one of
> >> > the following types:
> >> > + a type compatible with the effective type of the object,
> >> > [...]
> >> > + an aggregate or union type that includes one of the
> >> > aforementioned
> >> > types among its members [...]"
> >> > 
> >> > That would seem to allow this usage: aa (effective type "int")
> >> > is
> >> > being accessed via an lvalue "ptr[i]->index" of type "int".
> >> > 
> >> > The second point would even seem to allow something like "ptr[i]
> >> > =
> >> > ..." if aa was declared "int aa[2];", though that seems to be
> >> > going
> >> > too far. It also seems to be very difficult to pin down a
> >> > meaning
> >> > (from the standard) for "a->b" if a is not a pointer to an
> >> > object
> >> > with
> >> > the correct effective type. So the entire area is probably one
> >> > that's
> >> > open to interpretation.
> >> > 
> >> > I've added cfe-dev to the list; they're the *professional*
> >> > language
> >> > lawyers.
> >> 
> >> Coincidentally, this also seems to be PR20585 (adding Jiangning
> >> Liu, the
> >> reporter of that bug, to this thread too).
> >> 
> >> -Hal
> >> 
> >> > 
> >> > Cheers.
> >> > 
> >> > Tim.
> >> > _______________________________________________
> >> > cfe-dev mailing list
> >> > cfe-dev at cs.uiuc.edu
> >> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> >> > 
> >> 
> >> --
> >> Hal Finkel
> >> Assistant Computational Scientist
> >> Leadership Computing Facility
> >> Argonne National Laboratory
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> > 
> > 
> > 
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> > 
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory



More information about the llvm-dev mailing list