<div dir="ltr">(modulo this, btw, the code LGTM.I definitely don't think we should wait to resolve this before you commit it).<div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Aug 23, 2016 at 1:32 PM, Daniel Berlin <span dir="ltr"><<a href="mailto:dberlin@dberlin.org" target="_blank">dberlin@dberlin.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Ugh.<div>Okay, so.</div><div>A couple thoughts:<br><br></div><div>1. We could maintain it by doing batch use opt at the end of each pass or something, and then using the faster use optimizer to reoptimize affected loads (it's easy to tell the set of possibly affected loads).</div><div><br></div><div>2. We could drop it. We would still build an initially optimized form, but whatever happens past that, happens. We would make getClobberingMemoryAccess in the default walker modify loads to store the optimized result on them, and add a bool to MemoryUse to track which loads may need to be reoptimized (that gets reset in setDefiningAccess unless you tell it not to).</div><div><br>This will let you call getClobberingMemoryAccess on everything without it being really expensive for loads (right now it does a bunch of search/etc setup, and we need a way to avoid this)<br></div><div><br></div><div>The downside to all of this is<br><br></div><div>1. The walker now modifies loads. Even if you are storing the defining accesses somewhere, and we change it (not sure why this would happen, but ...), it should be okay, it would just mean you have a valid but conservative answer.</div><div><br></div><div>2. It means a stores uses will no longer include all possible loads that die there. It will only include some, unless you run around and call getClobberingMemoryAccess on every load.<br></div><div><br></div><div>I will have to do that in NewGVN to make load/store forwarding work sanely, but i'll deal with that.</div><div><br></div><div><br></div><div>+ a few others for thoughts.</div><div><br></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Aug 23, 2016 at 1:04 PM, Geoff Berry <span dir="ltr"><<a href="mailto:gberry@codeaurora.org" target="_blank">gberry@codeaurora.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><div><div>
<p><br>
</p>
<br>
<div>On 8/23/2016 2:17 PM, Geoff Berry via
llvm-commits wrote:<br>
</div>
<blockquote type="cite">
<p><br>
</p>
<br>
<div>On 8/23/2016 1:47 PM, Daniel Berlin
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, Aug 23, 2016 at 10:17 AM,
Geoff Berry <span dir="ltr"><<a href="mailto:gberry@codeaurora.org" target="_blank">gberry@codeaurora.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">gberry
added inline comments.<br>
<span><br>
================<br>
Comment at: lib/Transforms/Scalar/EarlyCSE<wbr>.cpp:537<br>
@@ +536,3 @@<br>
+ MemoryAccess *LaterDef =<br>
+ MSSA->getWalker()->getClobberi<wbr>ngMemoryAccess(LaterInst);<br>
+ return MSSA->dominates(LaterDef,
MSSA->getMemoryAccess(EarlierI<wbr>nst));<br>
----------------<br>
</span><span>dberlin wrote:<br>
> gberry wrote:<br>
> > dberlin wrote:<br>
> > > 1. For loads, you don't have to ask for
the clobbering access. It's already optimized such
that getDefiningAccess == the clobbering access<br>
> > ><br>
> > > 2. For stores, not sure if you realize
this, but<br>
> > ><br>
> > > given<br>
> > > store q (lets's call this a)<br>
> > > x = load p<br>
> > > store q (let's call this b)<br>
> > ><br>
> > ><br>
> > > if you call getClobberingMemoryAccess
on b, it will return a.<br>
> > ><br>
> > ><br>
> > For 1., I was not clear on whether this
holds true after store removal.<br>
> ><br>
> > For 2., yeah I get this, I'm not sure what
you're getting at though. The removal of this second
store by EarlyCSE doesn't use MemorySSA to check for
intervening loads in this change. It uses the
'LastStore' tracking to know when a store made
redundant by a second store can be removed.<br>
> 1. Updates have to make it hold after store
removal :)<br>
><br>
> The problem is that if we don't keep this
invariant up to date, it means everyone uses
getClobberingAccess, which does a bunch of work to
discover the load already points to the same thing.<br>
><br>
> Everyone doing that is much higher than the cost
of one person updating the dominating def.<br>
><br>
> (there is one case were getClobberingAccess will
give you a better answer, and that is on cases where
we gave up during use optimization. I only have one
testcase this occurs on. We only give up on
optimizing a load if it's going to be super expensive,
and you probably do *not* want to try to get better
answers in that case).<br>
><br>
> As for updating when you remove stores, you
should simply be able to replace any loads the store
uses with getClobberingAccess(store) using RAUW.<br>
><br>
> Under the covers, removeMemoryAccess calls RAUW
with the DefiningAccess.<br>
> We could change it to use
getClobberingMemoryAccess for loads, and
DefiningAccess for stores.<br>
><br>
> 2. ah, okay.<br>
><br>
><br>
</span>Okay, I get why just checking the defining access
for loads is better (we get to skip the AA check).<br>
</blockquote>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> For
stores, we may be able to do something faster than
calling getClobberingAccess(store). We could instead
walk up the store defining access chain and stop if we
get to a point that dominates the earlier load or a
clobbering write. I'll have to time this to see if it
makes a difference. I guess it will depend on what
percentage of the time the clobber cache has been thrown
away.<br>
</blockquote>
<div><br>
</div>
<div>Yes. You are basically designing a custom walker
here, and that's great.</div>
<div>If it matters, I would just make it a walker class
and </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>
As for updating when removing stores: it seems like
doing RAUW getClobberingAccess(store) is not optimal in
some cases. </blockquote>
<div><br>
Just out of curiosity do you have a real example of the
below? We work really hard to avoid generating that.</div>
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">For
example:<br>
<br>
store @G1 ; 1 = MD(entry)<br>
store @G2 ; 2 = MD(1)<br>
store %p ; 3 = MD(2)<br>
load @G1 ; MU(3)<br>
load @G2 ; MU(3)<br>
</blockquote>
<div><br>
</div>
<div>These should not be MU(3), they should be MU(1) and
MU(2), unless the store is *actually* a clobber for
them. If the store is actually a clobber, i don't see
how you could legally remove or move the store (since it
conflicts with G1/G2) :)</div>
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>
If we remove 3 and RUAW getClobberingAccess(3) (=2) we
get:<br>
<br>
store @G1 ; 1 = MD(entry)<br>
store @G2 ; 2 = MD(1)<br>
load @G1 ; MU(2)<br>
load @G2 ; MU(2)</blockquote>
<div><br>
</div>
<div>I agree that if you can get memoryssa to generate the
first, you will get the second and it will be
non-optimal. If you can, and it's legal to move or
remove the store, that seems like a super-rare case
unless i'm missing something?<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
Just to reply to this one particular point for now, here's a case
where this can happen because of EarlyCSE removing a store that is
simply storing back the value just loaded from the same memory
location:<br>
<br>
<br>
define void @test3(i32* %p) {<br>
entry:<br>
; 1 = MemoryDef(liveOnEntry)<br>
store i32 1, i32* @G1<br>
; 2 = MemoryDef(1)<br>
store i32 2, i32* @G2<br>
; MemoryUse(2)<br>
%xp = load i32, i32* %p<br>
; 3 = MemoryDef(2)<br>
store i32 %xp, i32* %p<br>
; MemoryUse(3)<br>
%x1 = load i32, i32* @G1<br>
; MemoryUse(3)<br>
%x2 = load i32, i32* @G2<br>
; 4 = MemoryDef(3)<br>
call void @foo(i32 %x1, i32 %x2, i32 %xp)<br>
ret void<br>
}<br>
<br>
Removing: store i32 %xp, i32* %p<br>
<br>
define void @test3(i32* %p) {<br>
entry:<br>
; 1 = MemoryDef(liveOnEntry)<br>
store i32 1, i32* @G1<br>
; 2 = MemoryDef(1)<br>
store i32 2, i32* @G2<br>
; MemoryUse(2)<br>
%xp = load i32, i32* %p<br>
; MemoryUse(2)<br>
%x1 = load i32, i32* @G1<br>
; MemoryUse(2)<br>
%x2 = load i32, i32* @G2<br>
; 4 = MemoryDef(2)<br>
call void @foo(i32 %x1, i32 %x2, i32 %xp)<br>
ret void<br>
}<br>
<br>
</blockquote>
<br></div></div>
... and to push this case a bit further, it seems like it wouldn't
be too hard to construct a case where removing this store would
necessitate looking down through MemoryPhis in order to maintain the
optimized-use invariant, possibly even leading to more MemoryPhi
optimization?<br>
<br>
<blockquote type="cite"><span> <br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>If you have a testcase with a real example, that
would be super-interesting to me.<br>
</div>
<div>Because if we can't get it to generate precise info
in most if not all cases, it's possibly not worth doing
the work we are doing to optimize uses.</div>
<div><br>
</div>
<div>I don't claim we make the perfect set of tradeoffs
right now. I know for the use cases i cared about, it
makes an IMHO good set of choices.</div>
<div><br>
</div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>
</blockquote>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> but
the load @G1 would be more precise if it was MU(1) (and
the invariant that defining access == clobbering access
would be broken). Is this just a compile-time/precision
trade-off? </blockquote>
<div><br>
</div>
<div>Kinda. The main problem is it affects everyone
downstream *right now* (again, we could change this at
the cost of having the walker mutate loads).</div>
<div><br>
</div>
<div>Imagine something upstream of you generates imprecise
info for loads.</div>
<div>You use getClobberingAccess to work around it,
because you want the best answer.</div>
<div><br>
Assume everything preserves MemorySSA.</div>
<div>We still can't really cache this sanely with a
side-cache: either it gets blown away, or takes up too
much space.</div>
<div>Caching the result of every load without rewriting is
another O(N) space factor (we did it before, and george
had to undo it :P).</div>
<div><br>
</div>
<div>So now you wasted a bunch of time looking up the real
clobbering result for the load.</div>
<div>So will the next pass, and the pass after that, and
...</div>
<div><br>
</div>
<div>Stores get pushed *upwards* a lot less than loads, so
calling getClobberingMemoryAccess on them is more rare,
and caching them is more sane, and "misses" are lower
cost.</div>
<div><br>
</div>
<div>For loads, we can work around having to have a
separate cache by rewriting the defining access in the
main walker as a cache (and marking them as optimized
with a single bit or something, so we know not to waste
time doing it again until the defining access gets reset
again), but then the walker is mutating the MemorySSA,
which is "not ideal" if we can avoid it.</div>
<div><br>
</div>
<div>If we can't, such is life, i'm basically trying to
kick this can as far down the road as i can before we
use that solution :)</div>
<div><br>
</div>
<div>But y'all are the ones doing a lot of the work at
this point, so let me know if you think we should stop
trying to maintain these invariants.</div>
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Maybe
for that reason it makes more sense to let the client
decide if they want to do the simple RAUW
getClobberingAccess(Store) or optimize each use
separately?<br>
<br>
</blockquote>
<div><br>
If we can make it come out optimal for everyone, we
should do that.</div>
<div>Otherwise, one client screws up another :)<br>
</div>
<div><br>
</div>
<div>If we can't, yeah, sure, then we start making
tradeoffs.</div>
<div><br>
</div>
<div><br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
<pre cols="72">--
Geoff Berry
Employee of Qualcomm Datacenter Technologies, Inc.
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.</pre>
<br>
<fieldset></fieldset>
<br>
</span><pre>______________________________<wbr>_________________
llvm-commits mailing list
<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a>
</pre>
</blockquote><span>
<br>
<pre cols="72">--
Geoff Berry
Employee of Qualcomm Datacenter Technologies, Inc.
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.</pre>
</span></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>