[llvm-dev] Reordering of load/stores using MemorySSA

Tue Oct 2 08:00:12 PDT 2018

On Tue, Oct 2, 2018 at 4:44 AM Juneyoung Lee <juneyoung.lee at sf.snu.ac.kr> wrote:
>
> Hello Daniel,
>
> Thank you for the reply.
> My impression was that embedding such information into MemorySSA
> would be helpful for shorter compilation time
> as well as people who needs this kind of information for writing optimizations,
> but building time of MemorySSA also should have been considered.

Right now the building time of MemorySSA in LLVM has high cost because
AA in LLVM is so slow.
(It has no meaningful cost when built in GCC)

It would be ... a significant undertaking to speed up AA in LLVM.

(The underlying problem is that BasicAA is completely stateless, but
does a tremendous amount of work.
Thus it will walk over all the IR, but has no caching)

>
> > I suspect you will find the cost is not worth it without a meaningful
> > motivating optimization (like store PRE or something)
>
> I think implementing store PRE for very simple cases seems interesting
> (not only for MemorySSA, but the optimization itself as well).
> I can give it a try to implement it if it is straightforward.
> From which file should I start?

There have been PDSE/etc patches posted to llvm-dev before that are
probably a good start.
> By the way, while looking through memory related optimizations, I found this patch -
> https://reviews.llvm.org/D5422 .
> Would it make sense if store PRE deals with atomic operations
> as well? I wonder whether people are very interested in
> optimizing atomic operations / there are more patches like this.
>
> Thank you,
> Juneyoung Lee
>
> On Tue, Oct 2, 2018 at 1:38 PM Daniel Berlin <dberlin at dberlin.org> wrote:
>>
>> On Mon, Oct 1, 2018 at 9:03 PM Juneyoung Lee <juneyoung.lee at sf.snu.ac.kr> wrote:
>> >
>> >
>> > Hello all,
>> >
>> > It seems that it is insufficient to
>> > rely on MemorySSA to correctly reorder load/stores.
>>
>> It depends on what you are trying to do.
>>
>> >
>> > For example, we have this following code:
>> >
>> > v = load q
>> > store 10, p
>> >
>> > =>
>> >
>> > store 10, p
>> > v = load q // This is incorrect if p and q may alias
>> >
>> > In the MemorySSA representation however,
>> > there's no syntactic dependency between load/store,
>> > because MemorySSA before transformation would look like this:
>> >
>> > USE(M0) // corresponding to `v = load q`
>> > M1 = DEF(M0) // corresponding to `store 10, p`
>>
>> This is correct, MemorySSA, like SSA, represents upwards dependencies
>> and not downwards ones.
>>
>> Much like there is SSU and SSI, and SSA/SSU/SSI on the reverse flow
>> graph, there are analogues for MemorySSA that we do not implement.
>>
>> >
>> > Reordering these two nodes does not break syntactic dependency,
>> This is true in regular SSA as well :)
>> It just happens that memory is different due to aliasing.
>>
>> > hence additional flow-sensitive analysis is needed to check whether reordering is valid.
>>
>> > To my understanding, GVNHoist seems to do that:
>> > Memory operations in basic blocks are checked by
>> > GVNHoist::hasMemoryUse().
>> >
>> Correct.
>>
>> >
>> > If USE nodes of MemorySSA also have left-hand side, we can correctly
>> > represent this kind of reorderability IMHO.
>> >
>> > For the example above:
>> >
>> > U1 = USE(M0) // corresponding to `v = load q`
>> > M1 = DEF(M0, U1) // corresponding to `store 10, p`
>>
>> It is not this simple.
>> You also need to avoid multiple reaching uses for a single def.
>> IE
>> U0 = Use(M0) // corresponding to `v = load q`
>> U1 = Use(M1) // corresponding to `t = load r` where r aliases q aliases p
>> M1 = DEF(M0, ???)
>>
>> ??? must be U0, U1 above.
>>
>> If you have the analogue of phis, you will require that analogue to
>> merge the multiple reaching uses, and you may have multiple merges per
>> block.
>> IE
>> U2 = merge(U0, U1)
>> M1 = DEF (M0, U2)
>>
>> >
>> > Now there is a syntactic dependency between M1 and U1, hence
>> > we can syntactically check whether reordering is allowed without any flow-sensitive analysis.
>>
>> What's the goal?
>> Much like SSA is meant to speed up certain dataflow problems, so is
>> this. Something would have to be slow to make this worth it.
>>
>> MemorySSA in particular is deliberately an overapproximation in LLVM
>> (and GCC), because we discovered exactness is super-expensive,
>> impossible, and going as far as you can buys you almost nothing in
>> practice.
>> However, it makes for a much faster memorydependenceanalysis (which in
>> llvm is also only upwards and not downwards).
>>
>> Here, it's not clear that GVNHoist is slow enough to make building an
>> entire ReverseMemorySSA form worth it.
>> In particular, building reverse MemorySSA will double the number of
>> alias checks in most cases.
>> You would almost certainly have to fix the BasicAA caching issues and
>> finally split BasicAA or something to make it viable.
>> (MemorySSA already pays a high price for BasicAA being super slow).
>>
>> > If the store and load don't alias, we can remove U1 from DEF:
>> >
>> > U1 = USE(M0) // corresponding to `v = load q`
>> > M1 = DEF(M0) // corresponding to `store 10, p`
>> >
>> > Then, the reordering is valid.
>> >
>> > One issue is regarding PHI node: possible solution is to make
>> > two kinds of PHIS - PHIs for DEFs & PHIS for USEs, so there are
>> > at most 2 PHIs.
>> >
>> >
>> > Does this idea seem to make sense? Any thoughts?
>> >
>> It's certainly correct, i have considered doing it before but never
>> had something to test out whether it was really worth the cost or not.
>>
>> One previous blocker was that it requires a correct postdomtree to be
>> able to do placement of use phis.
>> That should now be solved.
>>
>> I suspect you will find the cost is not worth it without a meaningful
>> motivating optimization (like store PRE or something)
>
>
>
> --
>
> Juneyoung Lee
> Software Foundation Lab, Seoul National University