[llvm-dev] Reordering of load/stores using MemorySSA

Tue Oct 2 08:02:18 PDT 2018

> There have been PDSE/etc patches posted to llvm-dev before that are
> probably a good start.

Thanks for the info! :)

Juneyoung Lee

On Wed, Oct 3, 2018 at 12:00 AM Daniel Berlin <dberlin at dberlin.org> wrote:

> On Tue, Oct 2, 2018 at 4:44 AM Juneyoung Lee <juneyoung.lee at sf.snu.ac.kr>
> wrote:
> >
> > Hello Daniel,
> >
> > Thank you for the reply.
> > My impression was that embedding such information into MemorySSA
> > would be helpful for shorter compilation time
> > as well as people who needs this kind of information for writing
> optimizations,
> > but building time of MemorySSA also should have been considered.
>
> Right now the building time of MemorySSA in LLVM has high cost because
> AA in LLVM is so slow.
> (It has no meaningful cost when built in GCC)
>
> It would be ... a significant undertaking to speed up AA in LLVM.
>
> (The underlying problem is that BasicAA is completely stateless, but
> does a tremendous amount of work.
> Thus it will walk over all the IR, but has no caching)
>
> >
> > > I suspect you will find the cost is not worth it without a meaningful
> > > motivating optimization (like store PRE or something)
> >
> > I think implementing store PRE for very simple cases seems interesting
> > (not only for MemorySSA, but the optimization itself as well).
> > I can give it a try to implement it if it is straightforward.
> > From which file should I start?
>
> There have been PDSE/etc patches posted to llvm-dev before that are
> probably a good start.
> > By the way, while looking through memory related optimizations, I found
> this patch -
> > https://reviews.llvm.org/D5422 .
> > Would it make sense if store PRE deals with atomic operations
> > as well? I wonder whether people are very interested in
> > optimizing atomic operations / there are more patches like this.
> >
> > Thank you,
> > Juneyoung Lee
> >
> > On Tue, Oct 2, 2018 at 1:38 PM Daniel Berlin <dberlin at dberlin.org>
> wrote:
> >>
> >> On Mon, Oct 1, 2018 at 9:03 PM Juneyoung Lee <
> juneyoung.lee at sf.snu.ac.kr> wrote:
> >> >
> >> >
> >> > Hello all,
> >> >
> >> > It seems that it is insufficient to
> >> > rely on MemorySSA to correctly reorder load/stores.
> >>
> >> It depends on what you are trying to do.
> >>
> >> >
> >> > For example, we have this following code:
> >> >
> >> > v = load q
> >> > store 10, p
> >> >
> >> > =>
> >> >
> >> > store 10, p
> >> > v = load q // This is incorrect if p and q may alias
> >> >
> >> > In the MemorySSA representation however,
> >> > there's no syntactic dependency between load/store,
> >> > because MemorySSA before transformation would look like this:
> >> >
> >> > USE(M0) // corresponding to `v = load q`
> >> > M1 = DEF(M0) // corresponding to `store 10, p`
> >>
> >> This is correct, MemorySSA, like SSA, represents upwards dependencies
> >> and not downwards ones.
> >>
> >> Much like there is SSU and SSI, and SSA/SSU/SSI on the reverse flow
> >> graph, there are analogues for MemorySSA that we do not implement.
> >>
> >> >
> >> > Reordering these two nodes does not break syntactic dependency,
> >> This is true in regular SSA as well :)
> >> It just happens that memory is different due to aliasing.
> >>
> >> > hence additional flow-sensitive analysis is needed to check whether
> reordering is valid.
> >>
> >> > To my understanding, GVNHoist seems to do that:
> >> > Memory operations in basic blocks are checked by
> >> > GVNHoist::hasMemoryUse().
> >> >
> >> Correct.
> >>
> >> >
> >> > If USE nodes of MemorySSA also have left-hand side, we can correctly
> >> > represent this kind of reorderability IMHO.
> >> >
> >> > For the example above:
> >> >
> >> > U1 = USE(M0) // corresponding to `v = load q`
> >> > M1 = DEF(M0, U1) // corresponding to `store 10, p`
> >>
> >> It is not this simple.
> >> You also need to avoid multiple reaching uses for a single def.
> >> IE
> >> U0 = Use(M0) // corresponding to `v = load q`
> >> U1 = Use(M1) // corresponding to `t = load r` where r aliases q aliases
> p
> >> M1 = DEF(M0, ???)
> >>
> >> ??? must be U0, U1 above.
> >>
> >> If you have the analogue of phis, you will require that analogue to
> >> merge the multiple reaching uses, and you may have multiple merges per
> >> block.
> >> IE
> >> U2 = merge(U0, U1)
> >> M1 = DEF (M0, U2)
> >>
> >> >
> >> > Now there is a syntactic dependency between M1 and U1, hence
> >> > we can syntactically check whether reordering is allowed without any
> flow-sensitive analysis.
> >>
> >> What's the goal?
> >> Much like SSA is meant to speed up certain dataflow problems, so is
> >> this. Something would have to be slow to make this worth it.
> >>
> >> MemorySSA in particular is deliberately an overapproximation in LLVM
> >> (and GCC), because we discovered exactness is super-expensive,
> >> impossible, and going as far as you can buys you almost nothing in
> >> practice.
> >> However, it makes for a much faster memorydependenceanalysis (which in
> >> llvm is also only upwards and not downwards).
> >>
> >> Here, it's not clear that GVNHoist is slow enough to make building an
> >> entire ReverseMemorySSA form worth it.
> >> In particular, building reverse MemorySSA will double the number of
> >> alias checks in most cases.
> >> You would almost certainly have to fix the BasicAA caching issues and
> >> finally split BasicAA or something to make it viable.
> >> (MemorySSA already pays a high price for BasicAA being super slow).
> >>
> >> > If the store and load don't alias, we can remove U1 from DEF:
> >> >
> >> > U1 = USE(M0) // corresponding to `v = load q`
> >> > M1 = DEF(M0) // corresponding to `store 10, p`
> >> >
> >> > Then, the reordering is valid.
> >> >
> >> > One issue is regarding PHI node: possible solution is to make
> >> > two kinds of PHIS - PHIs for DEFs & PHIS for USEs, so there are
> >> > at most 2 PHIs.
> >> >
> >> >
> >> > Does this idea seem to make sense? Any thoughts?
> >> >
> >> It's certainly correct, i have considered doing it before but never
> >> had something to test out whether it was really worth the cost or not.
> >>
> >> One previous blocker was that it requires a correct postdomtree to be
> >> able to do placement of use phis.
> >> That should now be solved.
> >>
> >> I suspect you will find the cost is not worth it without a meaningful
> >> motivating optimization (like store PRE or something)
> >
> >
> >
> > --
> >
> > Juneyoung Lee
> > Software Foundation Lab, Seoul National University
>

-- 

Juneyoung Lee
Software Foundation Lab, Seoul National University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181003/fd65187a/attachment.html>