[LLVMdev] Alias Analysis Semantics
Gerolf Hoflehner
ghoflehner at apple.com
Thu Aug 21 12:00:35 PDT 2014
I think I see the fundamental issue you are looking at. It is mentioned implicitly in the discussions, but not called out. In your CFG example there is no backedge (head always dominates tail), only retreat edges. So your graph is irreducible. For such graphs “simultaneous live” means that there be can be two dynamic execution paths where the variables (memory locations, objects etc) are simultaneously live, but they may not be live at the same time on all execution paths. Since LLVM uses SSA, all variables (memory locations, objects, … ) are strictly defined, so there cannot be irreducible dependence graphs. I believe this assumption is built into the alias queries. So to catch cases like in your scenario I think you need to base your queries on classical dataflow analysis.
-Gerolf
On Aug 21, 2014, at 4:18 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> ----- Original Message -----
>> From: "Jeremy Salwen" <jeremysalwen at gmail.com>
>> To: "Hal Finkel" <hfinkel at anl.gov>
>> Cc: "llvmdev" <llvmdev at cs.uiuc.edu>, "Daniel Berlin" <dberlin at dberlin.org>
>> Sent: Thursday, August 21, 2014 12:07:03 AM
>> Subject: Re: [LLVMdev] Alias Analysis Semantics
>>
>>
>>
>>
>>
>> Hi Hal,
>>
>> Thank you for your email, that makes a lot of sense to me. I am
>> working on some tools to use memory profiling to speculatively
>> replace memory loads and stores with value forwarding in hardware
>> implementations. I'd like to compare the profiled data to static
>> alias analysis, so it would be super useful if there was a way to
>> answer the questions about aliasing across backedges that
>> AliasAnalysis considers meaningless. Perhaps the MemoryDependence
>> would allow this?
>
> MemoryDependenceAnalysis might do what you want (in the sense that it will do PHI translation and, IIRC, could tell you that along some execution path some store to x in B might be read by some load of y in C, for example). There is also a DependenceAnalysis analysis for loops that will give you inter-iteration dependence information. It might be easier to help you if you could provide some small example of the transformation you'd like to perform.
>
> -Hal
>
>>
>> Thank you,
>> Jeremy
>>
>>
>>
>>
>>
>>
>> On Wed, Aug 20, 2014 at 11:58 PM, Hal Finkel < hfinkel at anl.gov >
>> wrote:
>>
>>
>>
>>
>> ----- Original Message -----
>>> From: "Jeremy Salwen" < jeremysalwen at gmail.com >
>>> To: "Daniel Berlin" < dberlin at dberlin.org >
>>> Cc: "llvmdev" < llvmdev at cs.uiuc.edu >
>>> Sent: Wednesday, August 20, 2014 7:58:46 PM
>>> Subject: Re: [LLVMdev] Alias Analysis Semantics
>>>
>>> Hi Daniel,
>>>
>>> Sorry for taking so long to respond. I spoke with a colleague more
>>> familiar with llvm who thought he could clear up my confusion, but
>>> we both came out of the conversation confused. I will try my best
>>> to
>>> explain the ambiguity.
>>>
>>> In an DAG, alias queries would be completely unambiguous. Every
>>> instruction would only be executed once, and every SSA value really
>>> would have a single static assignment. Any instruction would have a
>>> single value per execution, and when you ask alias(x,y) it would be
>>> a question of whether those single memory addresses overlapped.
>>>
>>> However, once backedges are introduced, and instruction can be
>>> executed multiple times. Every time the execution passes over a
>>> backedge, any instructions that are executed again, may have
>>> completely different values. Now, when you ask alias(x,y), it's
>>> ambiguous which pairs of addresses that query considers.
>>>
>>> In some cases, it seems clear. For example, in the loops we have
>>> been
>>> talking about, it seems like you are saying that it only compares
>>> the values in the same iteration, but does not say anything about
>>> aliasing between iterations. These are perfectly reasonable
>>> semantics for loops, but it does not define what the alias queries
>>> mean in general. For example, consider the following CFG:
>>>
>>>
>>>
>>
>>> If we ask a query alias(x,y) about instructions in basic blocks B
>>> and
>>> C respectively, what will those answers mean? There is always a
>>> backedge between their executions so are they considered to be
>>> different iterations and therefore they never alias?
>>
>> Perhaps Danny will have a better answer, but I think about it this
>> way: AliasAnalysis only answers queries about variable relationships
>> at points along the dynamic execution path of the program. For a
>> query to be well formed, there much be some point (some instruction
>> in the LLVM IR) at which both values being queried are
>> simultaneously live. In your example, there is no block dominated by
>> both B and C, and so no point in the program where it would be legal
>> to refer to both x and y. As a result, the query itself is
>> meaningless. Do you have some transformation or analysis in mind
>> that would issue such a query?
>>
>> -Hal
>>
>>
>>
>>>
>>>
>>> I get that there may be some way of understanding the alias queries
>>> in which "dynamic instructions" do not matter to the answer, but
>>> whatever that way is, it must also be possible to use that way to
>>> explain things in terms of the "dynamic instructions" that the CPU
>>> ends up executing. And at least for me that would resolve any
>>> confusion I have about the AliasAnalysis.
>>>
>>>
>>> Thank you again for your time,
>>>
>>> Jeremy
>>>
>>>
>>>
>>>
>>> On Thu, Aug 14, 2014 at 10:46 AM, Daniel Berlin <
>>> dberlin at dberlin.org
>>>> wrote:
>>>
>>>
>>>
>>> Let me take your previous example in an attempt to hopefully
>>> explain
>>> why "dynamic instructions" do not matter to the answer:
>>> You asked about:
>>>
>>>
>>>> std::vector<int> A(100);
>>>>> int* x,y;
>>>>> x=&A[0];
>>>>>
>>>>> for(int i=0; i<100; i++) {
>>>>> y=&A[i];
>>>>> *y=*x;
>>>>> x=&A[i+1];
>>>>> }
>>>>
>>>
>>>
>>>
>>> This is not what it looks like in LLVM.
>>>
>>>
>>> In LLVM, it looks like this:
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> std::vector<int> A(100);
>>>
>>> int* x,y;
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> x_1=GEP A, 0, 0
>>>
>>>
>>>
>>> for(int i=0; i<100; i++) {
>>> i_2 = phi (0, i_3)
>>> x_2 = phi(x_1, x_3)
>>> y_1 = GEP A, 0, i_2
>>> temp = load x_2
>>> store y_1, temp
>>> temp2 = add i_2, 1
>>> x_3 = GEP A, 0, temp2
>>> i_3 = add i_2, 1
>>> }
>>>
>>>
>>> As you can see, every time you redefine the value of the pointer x
>>> to
>>> a new value, it gets a new name.
>>>
>>>
>>> alias(x_1, y_1) = mayalias
>>> alias(x_2, y_1) = mustalias
>>> alias(x_3, y_1) = mayalias
>>>
>>>
>>>
>>>
>>> The only interesting question, IMHO, is whether alias(x_3,y_1) is
>>> MayAlias or NoAlias.
>>> x_2 clearly has the same value as y_1 at all times throughout the
>>> program.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Aug 14, 2014 at 6:54 AM, Daniel Berlin <
>>> dberlin at dberlin.org
>>>> wrote:
>>>
>>>
>>>
>>>
>>> On Thu, Aug 14, 2014 at 6:37 AM, Daniel Berlin <
>>> dberlin at dberlin.org
>>>> wrote:
>>>> On Wed, Aug 13, 2014 at 8:35 PM, Jeremy Salwen <
>>>> jeremysalwen at gmail.com > wrote:
>>>>> Hey Daniel,
>>>>>
>>>>> Thanks again for the help. I'm still a bit confused about the
>>>>> interface to
>>>>> the alias analysis. It seems like we are talking about different
>>>>> interfaces.
>>>> Sorry, yes, i was thinking of a different interface.
>>>>
>>>>> Has it changed from what the documentation says? As far as I
>>>>> can tell, the documentation takes a specific Value*, and no
>>>>> information
>>>>> about which dynamic execution it is talking about.
>>>>
>>>> The actual standard low level interface is:
>>>>
>>>> virtual AliasResult alias(const Location &LocA, const Location
>>>> &LocB);
>>>> (There is a value wrapper for this)
>>>>
>>>> I think part of the confusion here is that you keep redefining
>>>> pointers and expecting the results of queries to change.
>>>>
>>>> Outside of globals, LLVM is in SSA form. Thus, these are queries
>>>> will
>>>> end up being about specific pointers that will only be defined
>>>> once.
>>>> In all of your examples, you have asked about local variables.
>>>> These
>>>> local variables will be put in SSA. Every time you redefine the
>>>> pointer value, it will generate a new definition that is separate
>>>> from
>>>> the old ones.
>>>> If you transform your examples above into globals, you may get
>>>> different answers.
>>>>
>>>>
>>>>
>>>>>
>>>>> When you say "Right. It's a query about specific loads and
>>>>> stores." it
>>>>> sounds like you are saying that it does consider dynamic
>>>>> executions, and
>>>>> that you can make queries about specific dynamic executions. I
>>>>> can't seem
>>>>> to find this option anywhere in the API..
>>>> Yes, sorry, I was thinking of the MemoryDependence interface,
>>>> which
>>>> does give you that information.
>>>> AliasAnalysis does not.
>>>>
>>>>>
>>>>> Perhaps someone else could clear the confusion, because I feel
>>>>> like there is
>>>>> a gap of communication.
>>>>
>>>>
>>>>>
>>>>> For example, in this code what would the result be for the first
>>>>> store and
>>>>> the first load?
>>>>
>>>>>
>>>>>> #include <cstdio>
>>>>>> #include <cstdlib>
>>>>>> int main(int c, char** argv) {
>>>>>> int* A=(int*)malloc(100*sizeof(int));
>>>>>>
>>>>>> if(c>1) {
>>>>>> for(int i=0; i<5; i++) {
>>>>>> A[i]=5;
>>>>>> }
>>>>>> } else {
>>>>>> A[1]=5;
>>>>>> }
>>>>>> printf("%d\n",A[4]);
>>>>>> free(A);
>>>>>> }
>>>>>
>>>>>
>>>>
>>>>> MustAlias? Because whenever they are executed, they do alias. Or
>>>>> MayAlias?
>>>> Since you seem to be having trouble going all the way down the
>>>> rabbit
>>>> hole here, let me take a different approach.
>>>>
>>>> In this example, A[i] is not an LLVM location or value *.
>>>> You will never see A[i] in the LLVM IR to query about.
>>>> You will see two pointers as this:
>>>> %i.0 = phi i32 [ 0, %4 ], [ %11, %10 ]
>>>> %8 = sext i32 %i.0 to i64
>>>> %9 = getelementptr inbounds i32* %2, i64 %8
>>>> store i32 5, i32* %9, align 4
>>>>
>>>> and
>>>> %16 = getelementptr inbounds i32* %2, i64 4
>>>> %17 = load i32* %16, align 4
>>>>
>>>> The query will be alias(%9, %16)
>>>>
>>>
>>>
>>>> In this case, it cannot prove it is MustAlias, but it is actually
>>>> MustAlias.
>>>> It will, however, return MayAlias because it is not powerful
>>>> enough
>>>> to prove it.
>>>
>>> Sigh, too early in the morning, pasted the right output, wrote the
>>> answer while looking at the wrong pass output.
>>> These two pointers will always be MayAlias.
>>> LLVM later actually deletes the loop, and transforms the above into
>>> a
>>> memset + a load from memset, and ends up with two mustalias
>>> pointers
>>> in that case.
>>>
>>>
>>>
>>
>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>>
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140821/22677e38/attachment.html>
More information about the llvm-dev
mailing list