[LLVMdev] Alias Analysis Semantics

Jeremy Salwen jeremysalwen at gmail.com
Wed Aug 20 17:58:46 PDT 2014


Hi Daniel,

Sorry for taking so long to respond.  I spoke with a colleague more
familiar with llvm who thought he could clear up my confusion, but we both
came out of the conversation confused.  I will try my best to explain the
ambiguity.

In an DAG, alias queries would be completely unambiguous.  Every
instruction would only be executed once, and every SSA value really would
have a single static assignment.  Any instruction would have a single value
per execution, and when you ask alias(x,y)  it would be a question of
whether those single memory addresses overlapped.

However, once backedges are introduced, and instruction can be executed
multiple times.  Every time the execution passes over a backedge, any
instructions that are executed again, may have completely different
values.  Now, when you ask alias(x,y), it's ambiguous which pairs of
addresses that query considers.

In some cases, it seems clear.  For example, in the loops we have been
talking about, it seems like you are saying that it only compares the
values in the same iteration, but does not say anything about aliasing
between iterations.  These are perfectly reasonable semantics for loops,
but it does not define what the alias queries mean in general.  For
example, consider the following CFG:


​If we ask a query alias(x,y) about instructions in basic blocks B and C
respectively, what will those answers mean?  There is always a backedge
between their executions so are they considered to be different iterations
and therefore they never alias?

I get that there may be some way of understanding the alias queries in
which "dynamic instructions" do not matter to the answer, but whatever that
way is, it must also be possible to use that way to explain things in terms
of the "dynamic instructions" that the CPU ends up executing.  And at least
for me that would resolve any confusion I have about the AliasAnalysis.

Thank you again for your time,
Jeremy


On Thu, Aug 14, 2014 at 10:46 AM, Daniel Berlin <dberlin at dberlin.org> wrote:

> Let me take your previous example in an attempt to hopefully explain why
> "dynamic instructions" do not matter to the answer:
> You asked about:
>
> > std::vector<int> A(100);
> >> int* x,y;
> >> x=&A[0];
> >>
> >> for(int i=0; i<100; i++) {
> >>    y=&A[i];
> >>    *y=*x;
> >>    x=&A[i+1];
> >> }
> >
>
> This is not what it looks like in LLVM.
>
> In LLVM, it looks like this:
>
> std::vector<int> A(100);
>> int* x,y;
>>
>
>>
> x_1=GEP A, 0, 0
>>
> for(int i=0; i<100; i++) {
>>
>     i_2 = phi (0, i_3)
>     x_2 = phi(x_1, x_3)
>     y_1 = GEP A, 0, i_2
>     temp = load x_2
>     store y_1, temp
>     temp2 = add i_2, 1
>     x_3 = GEP A, 0, temp2
>     i_3 = add i_2, 1
> }
>
> As you can see, every time you redefine the value of the pointer x to a
> new value, it gets a new name.
>
> alias(x_1, y_1) = mayalias
> alias(x_2, y_1) = mustalias
> alias(x_3, y_1) = mayalias
>
>
> The only interesting question, IMHO, is whether alias(x_3,y_1) is MayAlias
> or NoAlias.
> x_2 clearly has the same value as y_1 at all times throughout the program.
>
>
>
>
> On Thu, Aug 14, 2014 at 6:54 AM, Daniel Berlin <dberlin at dberlin.org>
> wrote:
>
>> On Thu, Aug 14, 2014 at 6:37 AM, Daniel Berlin <dberlin at dberlin.org>
>> wrote:
>> > On Wed, Aug 13, 2014 at 8:35 PM, Jeremy Salwen <jeremysalwen at gmail.com>
>> wrote:
>> >> Hey Daniel,
>> >>
>> >> Thanks again for the help.  I'm still a bit confused about the
>> interface to
>> >> the alias analysis.   It seems like we are talking about different
>> >> interfaces.
>> > Sorry, yes, i was thinking of a different interface.
>> >
>> >> Has it changed from what the documentation says?  As far as I
>> >> can tell, the documentation takes a specific Value*, and no information
>> >> about which dynamic execution it is talking about.
>> >
>> > The actual standard low level interface is:
>> >
>> > virtual AliasResult alias(const Location &LocA, const Location &LocB);
>> > (There is a value wrapper for this)
>> >
>> > I think part of the confusion here is that you keep redefining
>> > pointers and expecting the results of queries to change.
>> >
>> > Outside of globals, LLVM is in SSA form.  Thus, these are queries will
>> > end up being about specific pointers that will only be defined once.
>> > In all of your examples, you have asked about local variables.  These
>> > local variables will be put in SSA.  Every time you redefine the
>> > pointer value, it will generate a new definition that is separate from
>> > the old ones.
>> > If you transform your examples above into globals, you may get
>> > different answers.
>> >
>> >
>> >
>> >>
>> >> When you say "Right. It's a query about specific loads and stores." it
>> >> sounds like you are saying that it does consider dynamic executions,
>> and
>> >> that you can make queries about specific dynamic executions.  I can't
>> seem
>> >> to find this option anywhere in the API..
>> > Yes, sorry, I was thinking of the MemoryDependence interface, which
>> > does give you that information.
>> > AliasAnalysis does not.
>> >
>> >>
>> >> Perhaps someone else could clear the confusion, because I feel like
>> there is
>> >> a gap of communication.
>> >
>> >
>> >>
>> >> For example, in this code what would the result be for the first store
>> and
>> >> the first load?
>> >
>> >>
>> >>> #include <cstdio>
>> >>> #include <cstdlib>
>> >>> int main(int c, char** argv) {
>> >>>  int* A=(int*)malloc(100*sizeof(int));
>> >>>
>> >>>  if(c>1) {
>> >>>   for(int i=0; i<5; i++) {
>> >>>     A[i]=5;
>> >>>   }
>> >>>  } else {
>> >>>   A[1]=5;
>> >>>  }
>> >>>  printf("%d\n",A[4]);
>> >>>  free(A);
>> >>> }
>> >>
>> >>
>> >
>> >> MustAlias?  Because whenever they are executed, they do alias.  Or
>> MayAlias?
>> > Since you seem to be having trouble going all the way down the rabbit
>> > hole here, let me take a different approach.
>> >
>> > In this example, A[i] is not an LLVM location or value *.
>> > You will never see A[i] in the LLVM IR to query about.
>> > You will see two pointers as this:
>> >    %i.0 = phi i32 [ 0, %4 ], [ %11, %10 ]
>> >    %8 = sext i32 %i.0 to i64
>> >    %9 = getelementptr inbounds i32* %2, i64 %8
>> >    store i32 5, i32* %9, align 4
>> >
>> > and
>> >  %16 = getelementptr inbounds i32* %2, i64 4
>> >    %17 = load i32* %16, align 4
>> >
>> > The query will be alias(%9, %16)
>> >
>>
>>
>> > In this case, it cannot prove it is MustAlias, but it is actually
>> MustAlias.
>> > It will, however, return MayAlias because it is not powerful enough to
>> prove it.
>>
>> Sigh, too early in the morning, pasted the right output, wrote the
>> answer while looking at the wrong pass output.
>> These two pointers will always be MayAlias.
>> LLVM later actually deletes the loop, and transforms the above into a
>> memset + a load from memset, and ends up with two mustalias pointers
>> in that case.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140820/5a4ee3bf/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Untitled.png
Type: image/png
Size: 6953 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140820/5a4ee3bf/attachment.png>


More information about the llvm-dev mailing list