<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p>Sounds good. I'll go ahead and check in what I have now (since
it is NFC) and we can proceed from there.</p>
<p>I'm currently trying out an approach where upon removing a store
I eagerly re-optimize phis, but just mark uses as dirty, then only
re-compute the clobber for dirty uses if they come up in a
isSameMemGeneration query or at the end of the pass, which is sort
of a hybrid of the two approaches you described.<br>
</p>
<br>
<div class="moz-cite-prefix">On 8/24/2016 7:34 PM, Daniel Berlin
wrote:<br>
</div>
<blockquote
cite="mid:CAF4BwTXACdw-Lnhn6hSab9U5LXeNd1CTgMAtWLci3j4nX7JKkg@mail.gmail.com"
type="cite">
<div dir="ltr">(modulo this, btw, the code LGTM.I definitely don't
think we should wait to resolve this before you commit it).
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, Aug 23, 2016 at 1:32 PM, Daniel
Berlin <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:dberlin@dberlin.org" target="_blank">dberlin@dberlin.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Ugh.
<div>Okay, so.</div>
<div>A couple thoughts:<br>
<br>
</div>
<div>1. We could maintain it by doing batch use opt at the
end of each pass or something, and then using the faster
use optimizer to reoptimize affected loads (it's easy to
tell the set of possibly affected loads).</div>
<div><br>
</div>
<div>2. We could drop it. We would still build an
initially optimized form, but whatever happens past
that, happens. We would make getClobberingMemoryAccess
in the default walker modify loads to store the
optimized result on them, and add a bool to MemoryUse to
track which loads may need to be reoptimized (that gets
reset in setDefiningAccess unless you tell it not to).</div>
<div><br>
This will let you call getClobberingMemoryAccess on
everything without it being really expensive for loads
(right now it does a bunch of search/etc setup, and we
need a way to avoid this)<br>
</div>
<div><br>
</div>
<div>The downside to all of this is<br>
<br>
</div>
<div>1. The walker now modifies loads. Even if you are
storing the defining accesses somewhere, and we change
it (not sure why this would happen, but ...), it should
be okay, it would just mean you have a valid but
conservative answer.</div>
<div><br>
</div>
<div>2. It means a stores uses will no longer include all
possible loads that die there. It will only include
some, unless you run around and call
getClobberingMemoryAccess on every load.<br>
</div>
<div><br>
</div>
<div>I will have to do that in NewGVN to make load/store
forwarding work sanely, but i'll deal with that.</div>
<div><br>
</div>
<div><br>
</div>
<div>+ a few others for thoughts.</div>
<div><br>
</div>
</div>
<div class="HOEnZb">
<div class="h5">
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, Aug 23, 2016 at 1:04
PM, Geoff Berry <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:gberry@codeaurora.org"
target="_blank">gberry@codeaurora.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>
<div>
<p><br>
</p>
<br>
<div>On 8/23/2016 2:17 PM, Geoff Berry via
llvm-commits wrote:<br>
</div>
<blockquote type="cite">
<p><br>
</p>
<br>
<div>On 8/23/2016 1:47 PM, Daniel Berlin
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, Aug
23, 2016 at 10:17 AM, Geoff Berry
<span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:gberry@codeaurora.org"
target="_blank">gberry@codeaurora.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">gberry
added inline comments.<br>
<span><br>
================<br>
Comment at:
lib/Transforms/Scalar/EarlyCSE<wbr>.cpp:537<br>
@@ +536,3 @@<br>
+ MemoryAccess *LaterDef =<br>
+
MSSA->getWalker()->getClobberi<wbr>ngMemoryAccess(LaterInst);<br>
+ return
MSSA->dominates(LaterDef,
MSSA->getMemoryAccess(EarlierI<wbr>nst));<br>
----------------<br>
</span><span>dberlin wrote:<br>
> gberry wrote:<br>
> > dberlin wrote:<br>
> > > 1. For loads,
you don't have to ask for the
clobbering access. It's
already optimized such that
getDefiningAccess == the
clobbering access<br>
> > ><br>
> > > 2. For stores,
not sure if you realize this,
but<br>
> > ><br>
> > > given<br>
> > > store q
(lets's call this a)<br>
> > > x = load p<br>
> > > store q (let's
call this b)<br>
> > ><br>
> > ><br>
> > > if you call
getClobberingMemoryAccess on
b, it will return a.<br>
> > ><br>
> > ><br>
> > For 1., I was not
clear on whether this holds
true after store removal.<br>
> ><br>
> > For 2., yeah I get
this, I'm not sure what you're
getting at though. The
removal of this second store
by EarlyCSE doesn't use
MemorySSA to check for
intervening loads in this
change. It uses the
'LastStore' tracking to know
when a store made redundant by
a second store can be removed.<br>
> 1. Updates have to make
it hold after store removal :)<br>
><br>
> The problem is that if we
don't keep this invariant up
to date, it means everyone
uses getClobberingAccess,
which does a bunch of work to
discover the load already
points to the same thing.<br>
><br>
> Everyone doing that is
much higher than the cost of
one person updating the
dominating def.<br>
><br>
> (there is one case were
getClobberingAccess will give
you a better answer, and that
is on cases where we gave up
during use optimization. I
only have one testcase this
occurs on. We only give up on
optimizing a load if it's
going to be super expensive,
and you probably do *not* want
to try to get better answers
in that case).<br>
><br>
> As for updating when you
remove stores, you should
simply be able to replace any
loads the store uses with
getClobberingAccess(store)
using RAUW.<br>
><br>
> Under the covers,
removeMemoryAccess calls RAUW
with the DefiningAccess.<br>
> We could change it to use
getClobberingMemoryAccess for
loads, and DefiningAccess for
stores.<br>
><br>
> 2. ah, okay.<br>
><br>
><br>
</span>Okay, I get why just
checking the defining access for
loads is better (we get to skip
the AA check).<br>
</blockquote>
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex"> For
stores, we may be able to do
something faster than calling
getClobberingAccess(store). We
could instead walk up the store
defining access chain and stop
if we get to a point that
dominates the earlier load or a
clobbering write. I'll have to
time this to see if it makes a
difference. I guess it will
depend on what percentage of the
time the clobber cache has been
thrown away.<br>
</blockquote>
<div><br>
</div>
<div>Yes. You are basically
designing a custom walker here,
and that's great.</div>
<div>If it matters, I would just
make it a walker class and </div>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex"> <br>
As for updating when removing
stores: it seems like doing RAUW
getClobberingAccess(store) is
not optimal in some cases. </blockquote>
<div><br>
Just out of curiosity do you
have a real example of the
below? We work really hard to
avoid generating that.</div>
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">For
example:<br>
<br>
store @G1 ; 1 = MD(entry)<br>
store @G2 ; 2 = MD(1)<br>
store %p ; 3 = MD(2)<br>
load @G1 ; MU(3)<br>
load @G2 ; MU(3)<br>
</blockquote>
<div><br>
</div>
<div>These should not be MU(3),
they should be MU(1) and MU(2),
unless the store is *actually* a
clobber for them. If the store
is actually a clobber, i don't
see how you could legally remove
or move the store (since it
conflicts with G1/G2) :)</div>
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex"> <br>
If we remove 3 and RUAW
getClobberingAccess(3) (=2) we
get:<br>
<br>
store @G1 ; 1 = MD(entry)<br>
store @G2 ; 2 = MD(1)<br>
load @G1 ; MU(2)<br>
load @G2 ; MU(2)</blockquote>
<div><br>
</div>
<div>I agree that if you can get
memoryssa to generate the first,
you will get the second and it
will be non-optimal. If you
can, and it's legal to move or
remove the store, that seems
like a super-rare case unless
i'm missing something?<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
Just to reply to this one particular point
for now, here's a case where this can
happen because of EarlyCSE removing a
store that is simply storing back the
value just loaded from the same memory
location:<br>
<br>
<br>
define void @test3(i32* %p) {<br>
entry:<br>
; 1 = MemoryDef(liveOnEntry)<br>
store i32 1, i32* @G1<br>
; 2 = MemoryDef(1)<br>
store i32 2, i32* @G2<br>
; MemoryUse(2)<br>
%xp = load i32, i32* %p<br>
; 3 = MemoryDef(2)<br>
store i32 %xp, i32* %p<br>
; MemoryUse(3)<br>
%x1 = load i32, i32* @G1<br>
; MemoryUse(3)<br>
%x2 = load i32, i32* @G2<br>
; 4 = MemoryDef(3)<br>
call void @foo(i32 %x1, i32 %x2, i32
%xp)<br>
ret void<br>
}<br>
<br>
Removing: store i32 %xp, i32* %p<br>
<br>
define void @test3(i32* %p) {<br>
entry:<br>
; 1 = MemoryDef(liveOnEntry)<br>
store i32 1, i32* @G1<br>
; 2 = MemoryDef(1)<br>
store i32 2, i32* @G2<br>
; MemoryUse(2)<br>
%xp = load i32, i32* %p<br>
; MemoryUse(2)<br>
%x1 = load i32, i32* @G1<br>
; MemoryUse(2)<br>
%x2 = load i32, i32* @G2<br>
; 4 = MemoryDef(2)<br>
call void @foo(i32 %x1, i32 %x2, i32
%xp)<br>
ret void<br>
}<br>
<br>
</blockquote>
<br>
</div>
</div>
... and to push this case a bit further, it
seems like it wouldn't be too hard to construct
a case where removing this store would
necessitate looking down through MemoryPhis in
order to maintain the optimized-use invariant,
possibly even leading to more MemoryPhi
optimization?<br>
<br>
<blockquote type="cite"><span> <br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>If you have a testcase with a
real example, that would be
super-interesting to me.<br>
</div>
<div>Because if we can't get it to
generate precise info in most if
not all cases, it's possibly not
worth doing the work we are doing
to optimize uses.</div>
<div><br>
</div>
<div>I don't claim we make the
perfect set of tradeoffs right
now. I know for the use cases i
cared about, it makes an IMHO good
set of choices.</div>
<div><br>
</div>
<div><br>
</div>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex"> <br>
</blockquote>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex"> but the
load @G1 would be more precise if
it was MU(1) (and the invariant
that defining access == clobbering
access would be broken). Is this
just a compile-time/precision
trade-off? </blockquote>
<div><br>
</div>
<div>Kinda. The main problem is it
affects everyone downstream *right
now* (again, we could change this
at the cost of having the walker
mutate loads).</div>
<div><br>
</div>
<div>Imagine something upstream of
you generates imprecise info for
loads.</div>
<div>You use getClobberingAccess to
work around it, because you want
the best answer.</div>
<div><br>
Assume everything preserves
MemorySSA.</div>
<div>We still can't really cache
this sanely with a side-cache:
either it gets blown away, or
takes up too much space.</div>
<div>Caching the result of every
load without rewriting is another
O(N) space factor (we did it
before, and george had to undo it
:P).</div>
<div><br>
</div>
<div>So now you wasted a bunch of
time looking up the real
clobbering result for the load.</div>
<div>So will the next pass, and the
pass after that, and ...</div>
<div><br>
</div>
<div>Stores get pushed *upwards* a
lot less than loads, so calling
getClobberingMemoryAccess on them
is more rare, and caching them is
more sane, and "misses" are lower
cost.</div>
<div><br>
</div>
<div>For loads, we can work around
having to have a separate cache by
rewriting the defining access in
the main walker as a cache (and
marking them as optimized with a
single bit or something, so we
know not to waste time doing it
again until the defining access
gets reset again), but then the
walker is mutating the MemorySSA,
which is "not ideal" if we can
avoid it.</div>
<div><br>
</div>
<div>If we can't, such is life, i'm
basically trying to kick this can
as far down the road as i can
before we use that solution :)</div>
<div><br>
</div>
<div>But y'all are the ones doing a
lot of the work at this point, so
let me know if you think we should
stop trying to maintain these
invariants.</div>
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">Maybe for
that reason it makes more sense to
let the client decide if they want
to do the simple RAUW
getClobberingAccess(Store) or
optimize each use separately?<br>
<br>
</blockquote>
<div><br>
If we can make it come out optimal
for everyone, we should do that.</div>
<div>Otherwise, one client screws up
another :)<br>
</div>
<div><br>
</div>
<div>If we can't, yeah, sure, then
we start making tradeoffs.</div>
<div><br>
</div>
<div><br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
<pre cols="72">--
Geoff Berry
Employee of Qualcomm Datacenter Technologies, Inc.
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.</pre>
<br>
<fieldset></fieldset>
<br>
</span>
<pre>______________________________<wbr>_________________
llvm-commits mailing list
<a moz-do-not-send="true" href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>
<a moz-do-not-send="true" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a>
</pre>
</blockquote><span>
<pre cols="72">--
Geoff Berry
Employee of Qualcomm Datacenter Technologies, Inc.
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.</pre>
</span></div>
</blockquote></div>
</div>
</div></div></blockquote></div>
</div>
</blockquote>
<pre class="moz-signature" cols="72">--
Geoff Berry
Employee of Qualcomm Datacenter Technologies, Inc.
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.</pre></body></html>