[llvm-dev] InstCombine GEP
Nuno Lopes via llvm-dev
llvm-dev at lists.llvm.org
Mon Aug 14 04:39:58 PDT 2017
> On Thu, Aug 10, 2017 at 10:58 AM, Nuno Lopes wrote:
>>> On Thu, Aug 10, 2017 at 12:22 AM, Nema, Ashutosh via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>>> I’m not sure how transforming GEP offset to i8 type will help alias
>>>> analysis & SROA for the mentioned test case.
>>>
>>> It should neither help nor hinder AA or SROA -- the two GEPs (the
>>> complex one and the simple one) are equivalent.
>> Since memory isn't typed in LLVM, having the GEP in terms of %struct.ABC
>> does not provide any extra information.
>>
>> Memory is somewhat typed, since if you store something with a type and
>> load the same location with a different type that's not valid (let's call
>> it poison).
>
> That may be true in C++, but I'm not sure if we want that to be true
> in LLVM IR. We would not be able to inline memcpy's if that were
> true, for one thing (e.g. https://godbolt.org/g/2VVJHU). Unless
> you're talking about TBAA metadata?
Ah, that's a very good point. This is a simplified version of your example:
https://godbolt.org/g/RyZYga
memcpy is transformed into a store of an int, which is then loaded as float.
Well, at least according to LLVM semantics, memory records the last stored
type size, such that it's invalid to store an i12 and load an i13. Not sure
why this restriction in the semantics is actually needed, though. If you
read a smaller/larger type than what was stored, you may end up with some
padding bits (poison). That's it.
>> Also, BasicAA has the following rule, with constants c1 and c2, and
>> arbitrary values x, y:
>> a[x][c1] no-alias a[y][c2] if:
>> the distance between c1 and c2 is sufficient to guarantee that the
>> accesses will be disjoint due to ending up in different array slots.
>> For this rule it's important to know what's the size of each array
>> element. This information is lost if GEPs are flattened.
>
> Do you mean to say that in LLVM IR we will conclude ptr0 and ptr1 don't
> alias:
>
> int a[4][4];
> ptr0 = &a[x][3];
> ptr1 = &a[y][7];
>
> If so, that doesn't match my understanding -- I was under the
> impression that in LLVM IR x = 2, y = 1 will give us must-alias
> between ptr0 and ptr1.
No, in this case it won't conclude no-alias, since 3 % 4 == 7 % 4. LLVM is
not that aggressive in exploiting UB. Anyway, concluding no-alias here was
only possible if the GEP index had the inrange attribute.
The example is more like this:
int a[4][5];
p = &a[x][0];
q = &a[y][1];
With access sizes sp, sq, respectively:
If the access size through p ends before q (q >= sp) and the access through
q doesn't go beyond the array limit (sq <= 5*sizeof(int) - 1*sizeof(int)),
then it's no-alias.
By flattening a GEP, you lose the information of the size of the each of
array/struct constituents. Hence this proof rule doesn't apply and you would
get may-alias for the example above.
Another interesting conclusion is that LLVM is being quite nice by allowing
accesses to multiple array/struct fields through the address of one of them.
The code is here:
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/BasicAliasAnalysis.cpp?revision=310766&view=markup#l1349
(you may need to scroll back to line 1294 or even to the beginning of that
function to see where all the data comes from)
Nuno
More information about the llvm-dev
mailing list