<div dir="ltr"><div>> There have been PDSE/etc patches posted to llvm-dev before that are<br>> probably a good start. </div><div> <br></div>Thanks for the info! :)<div><br></div><div>Juneyoung Lee</div></div><br><div class="gmail_quote"><div dir="ltr">On Wed, Oct 3, 2018 at 12:00 AM Daniel Berlin <<a href="mailto:dberlin@dberlin.org">dberlin@dberlin.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Tue, Oct 2, 2018 at 4:44 AM Juneyoung Lee <<a href="mailto:juneyoung.lee@sf.snu.ac.kr" target="_blank">juneyoung.lee@sf.snu.ac.kr</a>> wrote:<br>

><br>

> Hello Daniel,<br>

><br>

> Thank you for the reply.<br>

> My impression was that embedding such information into MemorySSA<br>

> would be helpful for shorter compilation time<br>

> as well as people who needs this kind of information for writing optimizations,<br>

> but building time of MemorySSA also should have been considered.<br>

<br>

Right now the building time of MemorySSA in LLVM has high cost because<br>

AA in LLVM is so slow.<br>

(It has no meaningful cost when built in GCC)<br>

<br>

It would be ... a significant undertaking to speed up AA in LLVM.<br>

<br>

(The underlying problem is that BasicAA is completely stateless, but<br>

does a tremendous amount of work.<br>

Thus it will walk over all the IR, but has no caching)<br>

<br>

><br>

> > I suspect you will find the cost is not worth it without a meaningful<br>

> > motivating optimization (like store PRE or something)<br>

><br>

> I think implementing store PRE for very simple cases seems interesting<br>

> (not only for MemorySSA, but the optimization itself as well).<br>

> I can give it a try to implement it if it is straightforward.<br>

> From which file should I start?<br>

<br>

There have been PDSE/etc patches posted to llvm-dev before that are<br>

probably a good start.<br>

> By the way, while looking through memory related optimizations, I found this patch -<br>

> <a href="https://reviews.llvm.org/D5422" rel="noreferrer" target="_blank">https://reviews.llvm.org/D5422</a> .<br>

> Would it make sense if store PRE deals with atomic operations<br>

> as well? I wonder whether people are very interested in<br>

> optimizing atomic operations / there are more patches like this.<br>

><br>

> Thank you,<br>

> Juneyoung Lee<br>

><br>

> On Tue, Oct 2, 2018 at 1:38 PM Daniel Berlin <<a href="mailto:dberlin@dberlin.org" target="_blank">dberlin@dberlin.org</a>> wrote:<br>

>><br>

>> On Mon, Oct 1, 2018 at 9:03 PM Juneyoung Lee <<a href="mailto:juneyoung.lee@sf.snu.ac.kr" target="_blank">juneyoung.lee@sf.snu.ac.kr</a>> wrote:<br>

>> ><br>

>> ><br>

>> > Hello all,<br>

>> ><br>

>> > It seems that it is insufficient to<br>

>> > rely on MemorySSA to correctly reorder load/stores.<br>

>><br>

>> It depends on what you are trying to do.<br>

>><br>

>> ><br>

>> > For example, we have this following code:<br>

>> ><br>

>> > v = load q<br>

>> > store 10, p<br>

>> ><br>

>> > =><br>

>> ><br>

>> > store 10, p<br>

>> > v = load q // This is incorrect if p and q may alias<br>

>> ><br>

>> > In the MemorySSA representation however,<br>

>> > there's no syntactic dependency between load/store,<br>

>> > because MemorySSA before transformation would look like this:<br>

>> ><br>

>> > USE(M0) // corresponding to `v = load q`<br>

>> > M1 = DEF(M0) // corresponding to `store 10, p`<br>

>><br>

>> This is correct, MemorySSA, like SSA, represents upwards dependencies<br>

>> and not downwards ones.<br>

>><br>

>> Much like there is SSU and SSI, and SSA/SSU/SSI on the reverse flow<br>

>> graph, there are analogues for MemorySSA that we do not implement.<br>

>><br>

>> ><br>

>> > Reordering these two nodes does not break syntactic dependency,<br>

>> This is true in regular SSA as well :)<br>

>> It just happens that memory is different due to aliasing.<br>

>><br>

>> > hence additional flow-sensitive analysis is needed to check whether reordering is valid.<br>

>><br>

>> > To my understanding, GVNHoist seems to do that:<br>

>> > Memory operations in basic blocks are checked by<br>

>> > GVNHoist::hasMemoryUse().<br>

>> ><br>

>> Correct.<br>

>><br>

>> ><br>

>> > If USE nodes of MemorySSA also have left-hand side, we can correctly<br>

>> > represent this kind of reorderability IMHO.<br>

>> ><br>

>> > For the example above:<br>

>> ><br>

>> > U1 = USE(M0) // corresponding to `v = load q`<br>

>> > M1 = DEF(M0, U1) // corresponding to `store 10, p`<br>

>><br>

>> It is not this simple.<br>

>> You also need to avoid multiple reaching uses for a single def.<br>

>> IE<br>

>> U0 = Use(M0) // corresponding to `v = load q`<br>

>> U1 = Use(M1) // corresponding to `t = load r` where r aliases q aliases p<br>

>> M1 = DEF(M0, ???)<br>

>><br>

>> ??? must be U0, U1 above.<br>

>><br>

>> If you have the analogue of phis, you will require that analogue to<br>

>> merge the multiple reaching uses, and you may have multiple merges per<br>

>> block.<br>

>> IE<br>

>> U2 = merge(U0, U1)<br>

>> M1 = DEF (M0, U2)<br>

>><br>

>> ><br>

>> > Now there is a syntactic dependency between M1 and U1, hence<br>

>> > we can syntactically check whether reordering is allowed without any flow-sensitive analysis.<br>

>><br>

>> What's the goal?<br>

>> Much like SSA is meant to speed up certain dataflow problems, so is<br>

>> this. Something would have to be slow to make this worth it.<br>

>><br>

>> MemorySSA in particular is deliberately an overapproximation in LLVM<br>

>> (and GCC), because we discovered exactness is super-expensive,<br>

>> impossible, and going as far as you can buys you almost nothing in<br>

>> practice.<br>

>> However, it makes for a much faster memorydependenceanalysis (which in<br>

>> llvm is also only upwards and not downwards).<br>

>><br>

>> Here, it's not clear that GVNHoist is slow enough to make building an<br>

>> entire ReverseMemorySSA form worth it.<br>

>> In particular, building reverse MemorySSA will double the number of<br>

>> alias checks in most cases.<br>

>> You would almost certainly have to fix the BasicAA caching issues and<br>

>> finally split BasicAA or something to make it viable.<br>

>> (MemorySSA already pays a high price for BasicAA being super slow).<br>

>><br>

>> > If the store and load don't alias, we can remove U1 from DEF:<br>

>> ><br>

>> > U1 = USE(M0) // corresponding to `v = load q`<br>

>> > M1 = DEF(M0) // corresponding to `store 10, p`<br>

>> ><br>

>> > Then, the reordering is valid.<br>

>> ><br>

>> > One issue is regarding PHI node: possible solution is to make<br>

>> > two kinds of PHIS - PHIs for DEFs & PHIS for USEs, so there are<br>

>> > at most 2 PHIs.<br>

>> ><br>

>> ><br>

>> > Does this idea seem to make sense? Any thoughts?<br>

>> ><br>

>> It's certainly correct, i have considered doing it before but never<br>

>> had something to test out whether it was really worth the cost or not.<br>

>><br>

>> One previous blocker was that it requires a correct postdomtree to be<br>

>> able to do placement of use phis.<br>

>> That should now be solved.<br>

>><br>

>> I suspect you will find the cost is not worth it without a meaningful<br>

>> motivating optimization (like store PRE or something)<br>

><br>

><br>

><br>

> --<br>

><br>

> Juneyoung Lee<br>

> Software Foundation Lab, Seoul National University<br>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><br></div><font size="1">Juneyoung Lee</font><div><font size="1">Software Foundation Lab, Seoul National University</font></div></div></div>