[llvm-dev] PartialAlias: different start addresses

Mon Jul 17 01:38:08 PDT 2017

Quoting Daniel Berlin <dberlin at dberlin.org>:

> On Sun, Jul 16, 2017 at 2:58 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>
>>
>>
>> On Sun, Jul 16, 2017 at 2:34 PM, Nuno Lopes <nunoplopes at sapo.pt> wrote:
>>
>>> On Sun, Jul 16, 2017, 12:45 PM Nuno Lopes wrote:
>>>>>
>>>>>> On 07/15/2017 04:51 AM, Nuno Lopes wrote:
>>>>>>
>>>>>>> On 07/14/2017 04:37 PM, Nuno Lopes wrote:
>>>>>>>>
>>>>>>>>> Thank you all for your replies.
>>>>>>>>> So here seems to be an agreement that the documentation for
>>>>>>>>> PartialAlias is incorrect.
>>>>>>>>>
>>>>>>>>> Daniel: now you got me wondering about MustAlias. This is what the
>>>>>>>>> docs say:
>>>>>>>>> "The MustAlias response may only be returned if the two memory
>>>>>>>>> objects are *guaranteed to always start at exactly the same
>>>>>>>>> location*"
>>>>>>>>>
>>>>>>>>> This statement is regardless of the access sizes.  For example, in
>>>>>>>>> SCEV AA:
>>>>>>>>> // If they evaluate to the same expression, it's a MustAlias.
>>>>>>>>> if (AS == BS)
>>>>>>>>>  return MustAlias;
>>>>>>>>>
>>>>>>>>> AS/BS are scev expressions for the pointers. So no check for the
>>>>>>>>> access size.
>>>>>>>>>
>>>>>>>>> So, does must needs to check for access sizes?  If so, SCEV AA is
>>>>>>>>> buggy and the documentation needs tweaking.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I'm under the impression that there is code that depends on the size
>>>>>>>> check, but I don't trust my recollection in this regard. SCEV AA is
>>>>>>>> known to cause miscompiles, IIRC, maybe you just found out why ;)
>>>>>>>>
>>>>>>>
>>>>>>> It's true that the CFL AAs have this code:
>>>>>>> if (LocA.Ptr == LocB.Ptr)
>>>>>>>  return LocA.Size == LocB.Size ? MustAlias : PartialAlias;
>>>>>>>
>>>>>>>
>>>>>>> I grepped for clients of MustAlias:
>>>>>>> ~/llvm/lib/Transforms $ grep -Rl MustAlias .
>>>>>>> ./ObjCARC/ObjCARCOpts.cpp
>>>>>>> ./ObjCARC/ProvenanceAnalysis.cpp
>>>>>>> ./Scalar/DeadStoreElimination.cpp
>>>>>>> ./Scalar/GVN.cpp
>>>>>>> ./Scalar/LICM.cpp
>>>>>>> ./Scalar/LoopVersioningLICM.cpp
>>>>>>> ./Scalar/MemCpyOptimizer.cpp
>>>>>>> ./Scalar/MergedLoadStoreMotion.cpp
>>>>>>> ./Scalar/NewGVN.cpp
>>>>>>> ./Utils/VNCoercion.cpp
>>>>>>>
>>>>>>> I glanced over all the uses in these files and I couldn't find any
>>>>>>> usage that requires sizes to match.  Actually most clients check
>>>>>>> access sizes themselves. Most don't need equal sizes, just need one to
>>>>>>> be smaller than the other.
>>>>>>>
>>>>>>
>>>>>> Does aliasing actually check both ways?
>>>>>> Otherwise, alias (A, B) will give different results than alias (B, A).
>>>>>>
>>>>>
>>>>> Sorry for the delay.
>>>>> I'm not sure I understood what you wrote, sorry.  What you wrote is
>>>>> true in
>>>>> general, but I don't see how MustAlias in particular is worse than the
>>>>> other
>>>>> AA results.
>>>>>
>>>>
>>>> Historically, in llvm, we have guaranteed that alias(a, b) ==alias(b, a)
>>>>
>>>> If it does:
>>>>
>>>> If start (a) == start (b)
>>>>   If size(b) < size(a)
>>>>       Return mustalias
>>>>   Return may or partial
>>>>
>>>> It will give different answers for alias(a, b) and alias (b, a)
>>>>
>>>
>>> Wait, but the size check is done in the clients, not in AA.
>>> AA only does (according to my understanding):
>>>
>>> If start (a) == start (b)
>>>     Return mustalias
>>> Return may or partial
>>>
>>>
>> That seems ... wrong to me, but i haven't thought very hard about it.
>>
>
> In particular, this would mean the documentation for partialalias *is*
> correct :P.
> IE it will only return partial if the starting addresses are not the same.
>
> I actually would have expected something like:
>
> if start(a) == start (b):
>   if sizes equal:
>     return mustalias
>   else:
>     return partialalias
> if otherwise overlap
>    return partial
> return may
>
> That said, i feel like both our existing code is trying to encode too much
> info into these simple enum answers.
>
> Maybe we should really be having something like:
>
> enum aliaskind {
> MustAlias,  // Exactly the same start and size
> PartialAliasSameStart,
> PartialAlias,
> Mayalias
> }
> enum overlapkind {
> FullOverlap, // Either the accesses are the same size, or one fully covers
> the other
> PartialOverlap, // They partially cover each other
> Unknown // No idea
> }
>
> std::pair<aliaskind, overlapkind> alias(A, B) {
> if (start(A) == start(B))
>   if sizes unknown:
>      return {PartialAliasSameStart, Unknown}
>   if sizes equal:
>      return {MustAlias, FullOverlap}
>   if one smaller than the other
>      return {PartialAliasSameStart, FullOverlap}
> etc
>
> Then we don't have to muck around in the definitions of must/partial/may
> too much, and the clients don't have to do too much when, for example, the
> CSE/DSE like ones really want to know either "are these accessing exactly
> the same memory location" *or* "can i eliminate this load/store in favor of
> of an earlier/later one, maybe with some masking".
>
> But maybe this doesn't need cleaning up.

You are the AA expert, not me :)  I was just trying to formalize  
LLVM's AA stuff and noticed that some things were not very consistent.
BTW, we need to take into account over-approximation of the analyses.  
So we cannot/shouldn't say that PartialAlias only happens when  
addresses are different because this imposes extra burden on the AA:  
it would have to prove that.  I believe it's better if PartialAlias  
doesn't carry any information about the addresses. That way MustAlias  
implies PartialAlias.
Anyway, this depends on what clients actually need. I don't have  
enough experience in this area to make any judgement about that; I'll  
leave that with you :)

Nuno