[PATCH] D59065: [BasicAA] Simplify inttoptr(and(ptrtoint(X), C)) to X, if C preserves all significant bits.

Wed Apr 3 14:53:12 PDT 2019

fhahn added a comment.

In D59065#1452543 <https://reviews.llvm.org/D59065#1452543>, @sanjoy wrote:

> In D59065#1449682 <https://reviews.llvm.org/D59065#1449682>, @fhahn wrote:
>
> > Ah thanks, together with @aqjune 's response, I think I now know what I was missing. If we have something like
> >
> >   int8_t* obj1 = malloc(4);
> >   int8_t* obj2 = malloc(4);
> >   int p = (intptr_t)(obj1 + 4);
> >   
> >   if (p != (intptr_t) obj2) return;
> >    
> >   *(int8_t*)(intptr_t)(obj1 + 4) = 0;   // <- here we alias ob1 and obj2?
> >   
> >
> > I thought the information obtained via the control flow, `p` aliases both `obj1` and `obj2`, is limited to the uses of `p`, but do I understand correctly that this is not the case and the information leaks to all equivalent expressions (that is for the snippet above, without GVN or any common code elimination)?
>
>
> Yes.  In the abstract LLVM machine pointers have provenance and integers don't.  All integers with the same bitwise value are equivalent (can be replaced one for another), but bitwise identical pointers are not necessarily equivalent.  This lets us do aggressive optimization on integers while still keeping a strong (ish) memory model.
>
> A consequence of this is that when you convert `(intptr_t)(obj1 + 4)` back to a pointer, the new pointer's provenance includes all pointers whose bitwise value could have been `obj1 + 4`.

Ah thanks, I was missing the global nature of physical pointers. I couldn't find this described anywhere (besides some of those things mentioned at a tutorial at EuroLLVM). If this is not described anywhere, do you think it would make sense to add it to the AliasAnalysis documentation page, for example?

Also, is the bitwise equality propagation just function local or across the whole module? If it is function-local, we might be able to convert ` inttoptr(and(ptrtoint(X), C)) ` chains to the intrinsic early on, for functions that just contain the operations to strip away the bits, or somewhere else?

>>> That seems problematic for another reason:  IIUC you're saying `Alias(inttoptr(ptrtoint(X) & -8), A)` == `Alias(X, A)`.  But `X` is an illegal pointer so it does not alias anything (reads and writes on that pointer is illegal)?
>> 
>> Agreed, I think we would need to make this explicit in the langref.  X is illegal, if you consider all bits of the pointer. But the address space and alignment limit the relevant bits of the pointer, so I suppose we could specify that for logical pointers, only the bits in the limited range identify the pointed-to object.
> 
> I haven't thought this through but it still seems fishy to me: IIRC LLVM's alias predicate is defined *if* a write to X can be observed by Y (or vice versa) *then* X aliases Y. So two readonly locations A and B are both must-alias and no-alias:  there can never a write to a readonly location so the antecedent of the predicate is false (so both "A aliases B" and "A does not alias B" are true).  It seems like we have a similar situation here:  `X` is an illegal address that you can't load or store from and so both "X alias A" and "X does not alias A" are true.  But `Alias(inttoptr(ptrtoint(X) & -8), A)` (which has a specific answer since it is legal to load from/store to `inttoptr(ptrtoint(X) & -8)`) has a definite answer.

Hm, if the definition is based on pointers directly and requires de-referenceability, that would be indeed be tricky. If the predicate is defined based on memory locations, one might be able to argue that the memory location is only reference by the valid bits. The documentation about must-alias/no-alias ( http://llvm.org/docs/AliasAnalysis.html#must-may-or-no ) is not as precise I think. I think we would have to resolve this question also when using the intrinsic.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59065/new/

https://reviews.llvm.org/D59065