[LLVMdev] [cfe-dev] For alias analysis, It's gcc too aggressive or LLVM need to improve?

Daniel Berlin dberlin at dberlin.org
Mon Aug 11 21:02:21 PDT 2014


So then there you go, a real language lawyer says it's invalid for
other reasons :)


On Mon, Aug 11, 2014 at 7:31 PM, Richard Smith <richard at metafoo.co.uk> wrote:
> I'll take this from the C++ angle; the C rules are not the same, and I'm not
> confident they give the same answer.
>
> On Mon, Aug 11, 2014 at 2:09 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>>
>> The access path matters (in some sense), but this is, AFIAK, valid no
>> matter how you look at it.
>>
>> Let's take a look line by line
>>
>> #include <stdio.h>
>> struct heap {int index; int b;};
>> struct heap **ptr;
>> int aa;
>>
>> int main() {
>>   struct heap element;
>>   struct heap *array[2];
>>   array[0] = (struct heap *)&aa; <- Okay so far.
>>   array[1] = &element; <- Clearly okay
>>   ptr = array; <- still okay so far
>>   aa = 1; <- not pointer related.
>>   int i; <- not pointer related
>>   for (i =0; i< 2; i++) { <- not pointer related
>>     printf("i is %d, aa is %d\n", i, aa); <- not pointer related
>>     ptr[i]->index = 0; <- Here is where it gets wonky.
>>
>> <rest of codeis irrelevan>
>>
>> First, ptr[i] is an lvalue, of type struct heap *, and ptr[i]-> is an
>> lvalue of type struct heap (in C++03, this is 5.2.5 paragraph 3, check
>> footnote 59).
>
>
> This is where we get the undefined behavior.
>
> 3.8/6: "[If you have a glvalue referring to storage but where there is no
> corresponding object, the] program has undefined behavior if:
> [...] the glvalue is used to access a non-static data member".
>
> There is no object of type 'heap' denoted by *ptr[0] (by 1.8/1, we can only
> create objects through definitions, new-expressions, and by creating
> temporary objects). So the behavior is undefined when we evaluate
> ptr[0]->index.
>
>> (I'm too lazy to parse the rules for whether E1.E2 is an lvalue,
>> because it doesn't end up making a difference)
>>
>> Let's assume, for the sake of argument,  the actual access to aa
>> occurs through an lvalue of type "struct heap" rather than "int"
>> In C++03 and C++11, it says:
>>
>> An object shall have its stored value accessed only by an lvalue
>> expression that has one of the following types:
>>
>> a type compatible with the effective type of the object,
>> a qualified version of a type compatible with the effective type of the
>> object,
>> a type that is the signed or unsigned type corresponding to the
>> effective type of the object,
>> a type that is the signed or unsigned type corresponding to a
>> qualified version of the effective type of the object,
>> an aggregate or union type that includes one of the aforementioned
>> types among its members (including, recursively, a member of a
>> subaggregate or contained union), or
>> a character type.
>> (C++11 adds something about dynamic type here)
>>
>> struct heap is "an aggregate or union type that includes one of the
>> aforementioned types among it's members".
>>
>> Thus, this is legal to access this int through an lvalue expression
>> that has a type of struct heap.
>> Whether the actual store is legal for other reasons, i don't know.
>> There are all kinds of rules about object alignment and value
>> representation that aren't my baliwick.  I leave it to another
>> language lawyer to say whether it's okay to do a store to something
>> that is essentially, a partial object.
>>
>> Note that GCC actually knows this is legal to alias, at least at the
>> tree level. I debugged it there, and it definitely isn't eliminating
>> it at a high level. It also completely understands the call to ->index
>> = 0 affects "aa", and has a reload for aa before the printf call.
>>
>> I don't know what is eliminating this at the RTL level, but i can't
>> see why it's illegal from *aliasing rules*. Maybe this is invalid for
>> some other reason.
>>
>>
>>
>>
>>
>>
>>
>>   }
>>   return 0;
>> }
>>
>> ptr[i]->index = 0;
>>
>> On Mon, Aug 11, 2014 at 1:13 PM, Reid Kleckner <rnk at google.com> wrote:
>> > +aliasing people
>> >
>> > I *think* this is valid, because the rules have always been described to
>> > me
>> > in terms of underlying storage type, and not access path. These are all
>> > ints, so all loads and stores can alias.
>> >
>> >
>> > On Sat, Aug 9, 2014 at 3:07 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>> >>
>> >> ----- Original Message -----
>> >> > From: "Tim Northover" <t.p.northover at gmail.com>
>> >> > To: "Jonas Wagner" <jonas.wagner at epfl.ch>
>> >> > Cc: "cfe-dev Developers" <cfe-dev at cs.uiuc.edu>, "LLVM Developers
>> >> > Mailing
>> >> > List" <llvmdev at cs.uiuc.edu>
>> >> > Sent: Friday, August 8, 2014 6:54:50 AM
>> >> > Subject: Re: [cfe-dev] [LLVMdev] For alias analysis, It's gcc too
>> >> > aggressive or LLVM need to improve?
>> >> >
>> >> > > your C program invokes undefined behavior when it dereferences
>> >> > > pointers that
>> >> > > have been converted to other types. See for example
>> >> > >
>> >> > >
>> >> > > http://stackoverflow.com/questions/4810417/c-when-is-casting-between-pointer-types-not-undefined-behavior
>> >> >
>> >> > I don't think it's quite that simple.The type-based aliasing rules
>> >> > come from 6.5p7 of C11, I think. That says:
>> >> >
>> >> > "An object shall have its stored value accessed only by an lvalue
>> >> > expression that has one of
>> >> > the following types:
>> >> >   + a type compatible with the effective type of the object,
>> >> >   [...]
>> >> >   + an aggregate or union type that includes one of the
>> >> >   aforementioned
>> >> > types among its members [...]"
>> >> >
>> >> > That would seem to allow this usage: aa (effective type "int") is
>> >> > being accessed via an lvalue "ptr[i]->index" of type "int".
>> >> >
>> >> > The second point would even seem to allow something like "ptr[i] =
>> >> > ..." if aa was declared "int aa[2];", though that seems to be going
>> >> > too far. It also seems to be very difficult to pin down a meaning
>> >> > (from the standard) for "a->b" if a is not a pointer to an object
>> >> > with
>> >> > the correct effective type. So the entire area is probably one that's
>> >> > open to interpretation.
>> >> >
>> >> > I've added cfe-dev to the list; they're the *professional* language
>> >> > lawyers.
>> >>
>> >> Coincidentally, this also seems to be PR20585 (adding Jiangning Liu,
>> >> the
>> >> reporter of that bug, to this thread too).
>> >>
>> >>  -Hal
>> >>
>> >> >
>> >> > Cheers.
>> >> >
>> >> > Tim.
>> >> > _______________________________________________
>> >> > cfe-dev mailing list
>> >> > cfe-dev at cs.uiuc.edu
>> >> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>> >> >
>> >>
>> >> --
>> >> Hal Finkel
>> >> Assistant Computational Scientist
>> >> Leadership Computing Facility
>> >> Argonne National Laboratory
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >
>> >
>> >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >
>
>



More information about the llvm-dev mailing list