[llvm-dev] InstCombine GEP

Mon Aug 14 04:39:58 PDT 2017

> On Thu, Aug 10, 2017 at 10:58 AM, Nuno Lopes wrote:
>>> On Thu, Aug 10, 2017 at 12:22 AM, Nema, Ashutosh via llvm-dev 
>>> <llvm-dev at lists.llvm.org> wrote:
>>>> I’m not sure how transforming GEP offset to i8 type will help alias
>>>> analysis & SROA for the mentioned test case.
>>>
>>> It should neither help nor hinder AA or SROA -- the two GEPs (the 
>>> complex one and the simple one) are equivalent.
>> Since memory isn't typed in LLVM, having the GEP in terms of %struct.ABC 
>> does not provide any extra information.
>>
>> Memory is somewhat typed, since if you store something with a type and 
>> load the same location with a different type that's not valid (let's call 
>> it poison).
>
> That may be true in C++, but I'm not sure if we want that to be true
> in LLVM IR.  We would not be able to inline memcpy's if that were
> true, for one thing (e.g. https://godbolt.org/g/2VVJHU).  Unless
> you're talking about TBAA metadata?

Ah, that's a very good point.  This is a simplified version of your example: 
https://godbolt.org/g/RyZYga
memcpy is transformed into a store of an int, which is then loaded as float.

Well, at least according to LLVM semantics, memory records the last stored 
type size, such that it's invalid to store an i12 and load an i13.  Not sure 
why this restriction in the semantics is actually needed, though.  If you 
read a smaller/larger type than what was stored, you may end up with some 
padding bits (poison). That's it.

>> Also, BasicAA has the following rule, with constants c1 and c2, and 
>> arbitrary values x, y:
>> a[x][c1] no-alias a[y][c2] if:
>> the distance between c1 and c2 is sufficient to guarantee that the 
>> accesses will be disjoint due to ending up in different array slots.
>> For this rule it's important to know what's the size of each array 
>> element. This information is lost if GEPs are flattened.
>
> Do you mean to say that in LLVM IR we will conclude ptr0 and ptr1 don't 
> alias:
>
>   int a[4][4];
>   ptr0 = &a[x][3];
>   ptr1 = &a[y][7];
>
> If so, that doesn't match my understanding -- I was under the
> impression that in LLVM IR x = 2, y = 1 will give us must-alias
> between ptr0 and ptr1.

No, in this case it won't conclude no-alias, since 3 % 4 == 7 % 4.  LLVM is 
not that aggressive in exploiting UB.  Anyway, concluding no-alias here was 
only possible if the GEP index had the inrange attribute.

The example is more like this:
  int a[4][5];
  p = &a[x][0];
  q = &a[y][1];

With access sizes sp, sq, respectively:
If the access size through p ends before q (q >= sp) and the access through 
q doesn't go beyond the array limit (sq <= 5*sizeof(int) - 1*sizeof(int)), 
then it's no-alias.

By flattening a GEP, you lose the information of the size of the each of 
array/struct constituents. Hence this proof rule doesn't apply and you would 
get may-alias for the example above.
Another interesting conclusion is that LLVM is being quite nice by allowing 
accesses to multiple array/struct fields through the address of one of them.

The code is here: 
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/BasicAliasAnalysis.cpp?revision=310766&view=markup#l1349
(you may need to scroll back to line 1294 or even to the beginning of that 
function to see where all the data comes from)

Nuno