[PATCH] D19821: [EarlyCSE] Optionally use MemorySSA. NFC.

Thu Aug 25 08:36:49 PDT 2016

Sounds good.  I'll go ahead and check in what I have now (since it is 
NFC) and we can proceed from there.

I'm currently trying out an approach where upon removing a store I 
eagerly re-optimize phis, but just mark uses as dirty, then only 
re-compute the clobber for dirty uses if they come up in a 
isSameMemGeneration query or at the end of the pass, which is sort of a 
hybrid of the two approaches you described.

On 8/24/2016 7:34 PM, Daniel Berlin wrote:
> (modulo this, btw, the code LGTM.I definitely don't think we should 
> wait to resolve this before you commit it).
>
>
> On Tue, Aug 23, 2016 at 1:32 PM, Daniel Berlin <dberlin at dberlin.org 
> <mailto:dberlin at dberlin.org>> wrote:
>
>     Ugh.
>     Okay, so.
>     A couple thoughts:
>
>     1. We could maintain it by doing batch use opt at the end of each
>     pass or something, and then using the faster use optimizer to
>     reoptimize affected loads (it's easy to tell the set of possibly
>     affected loads).
>
>     2. We could drop it.  We would still build an initially optimized
>     form, but whatever happens past that, happens.  We would make
>     getClobberingMemoryAccess in the default walker modify loads to
>     store the optimized result on them, and add a bool to MemoryUse to
>     track which loads may need to be reoptimized (that gets reset in
>     setDefiningAccess unless you tell it not to).
>
>     This will let you call getClobberingMemoryAccess on everything
>     without it being really expensive for loads (right now it does a
>     bunch of search/etc setup, and we need a way to avoid this)
>
>     The downside to all of this is
>
>     1. The walker now modifies loads.  Even if you are storing the
>     defining accesses somewhere, and we change it (not sure why this
>     would happen, but ...), it should be okay, it would just mean you
>     have a valid but conservative answer.
>
>     2.  It means a stores uses will no longer include all possible
>     loads that die there. It will only include some, unless you run
>     around and call getClobberingMemoryAccess on every load.
>
>     I will have to do that in NewGVN to make load/store forwarding
>     work sanely, but i'll deal with that.
>
>
>     + a few others for thoughts.
>
>
>     On Tue, Aug 23, 2016 at 1:04 PM, Geoff Berry
>     <gberry at codeaurora.org <mailto:gberry at codeaurora.org>> wrote:
>
>
>
>         On 8/23/2016 2:17 PM, Geoff Berry via llvm-commits wrote:
>>
>>
>>
>>         On 8/23/2016 1:47 PM, Daniel Berlin wrote:
>>>
>>>
>>>         On Tue, Aug 23, 2016 at 10:17 AM, Geoff Berry
>>>         <gberry at codeaurora.org <mailto:gberry at codeaurora.org>> wrote:
>>>
>>>             gberry added inline comments.
>>>
>>>             ================
>>>             Comment at: lib/Transforms/Scalar/EarlyCSE.cpp:537
>>>             @@ +536,3 @@
>>>             +  MemoryAccess *LaterDef =
>>>             + MSSA->getWalker()->getClobberingMemoryAccess(LaterInst);
>>>             +  return MSSA->dominates(LaterDef,
>>>             MSSA->getMemoryAccess(EarlierInst));
>>>             ----------------
>>>             dberlin wrote:
>>>             > gberry wrote:
>>>             > > dberlin wrote:
>>>             > > > 1. For loads, you don't have to ask for the
>>>             clobbering access. It's already optimized such that
>>>             getDefiningAccess == the clobbering access
>>>             > > >
>>>             > > > 2. For stores, not sure if you realize this, but
>>>             > > >
>>>             > > > given
>>>             > > > store q (lets's call this a)
>>>             > > > x = load p
>>>             > > > store q (let's call this b)
>>>             > > >
>>>             > > >
>>>             > > > if you call getClobberingMemoryAccess on b, it
>>>             will return a.
>>>             > > >
>>>             > > >
>>>             > > For 1., I was not clear on whether this holds true
>>>             after store removal.
>>>             > >
>>>             > > For 2., yeah I get this, I'm not sure what you're
>>>             getting at though.  The removal of this second store by
>>>             EarlyCSE doesn't use MemorySSA to check for intervening
>>>             loads in this change.  It uses the 'LastStore' tracking
>>>             to know when a store made redundant by a second store
>>>             can be removed.
>>>             > 1. Updates have to make it hold after store removal :)
>>>             >
>>>             > The problem is that if we don't keep this invariant up
>>>             to date, it means everyone uses getClobberingAccess,
>>>             which does a bunch of work to discover the load already
>>>             points to the same thing.
>>>             >
>>>             > Everyone doing that is much higher than the cost of
>>>             one person updating the dominating def.
>>>             >
>>>             > (there is one case were getClobberingAccess will give
>>>             you a better answer, and that is on cases where we gave
>>>             up during use optimization. I only have one testcase
>>>             this occurs on.  We only give up on optimizing a load if
>>>             it's going to be super expensive, and you probably do
>>>             *not* want to try to get better answers in that case).
>>>             >
>>>             > As for updating when you remove stores, you should
>>>             simply be able to replace any loads the store uses with
>>>             getClobberingAccess(store) using RAUW.
>>>             >
>>>             > Under the covers, removeMemoryAccess calls RAUW with
>>>             the DefiningAccess.
>>>             > We could change it to use getClobberingMemoryAccess
>>>             for loads, and DefiningAccess for stores.
>>>             >
>>>             > 2. ah, okay.
>>>             >
>>>             >
>>>             Okay, I get why just checking the defining access for
>>>             loads is better (we get to skip the AA check).
>>>
>>>             For stores, we may be able to do something faster than
>>>             calling getClobberingAccess(store).  We could instead
>>>             walk up the store defining access chain and stop if we
>>>             get to a point that dominates the earlier load or a
>>>             clobbering write.  I'll have to time this to see if it
>>>             makes a difference.  I guess it will depend on what
>>>             percentage of the time the clobber cache has been thrown
>>>             away.
>>>
>>>
>>>         Yes.  You are basically designing a custom walker here, and
>>>         that's great.
>>>         If it matters, I would just make it a walker class and
>>>
>>>
>>>             As for updating when removing stores: it seems like
>>>             doing RAUW getClobberingAccess(store) is not optimal in
>>>             some cases. 
>>>
>>>
>>>         Just out of curiosity do you have a real example of the
>>>         below?  We work really hard to avoid generating that.
>>>
>>>             For example:
>>>
>>>               store @G1 ; 1 = MD(entry)
>>>               store @G2 ; 2 = MD(1)
>>>               store %p ; 3 = MD(2)
>>>               load @G1 ; MU(3)
>>>               load @G2  ; MU(3)
>>>
>>>
>>>         These should not be MU(3), they should be MU(1) and MU(2),
>>>         unless the store is *actually* a clobber for them. If the
>>>         store is actually a clobber, i don't see how you could
>>>         legally remove or move the store (since it conflicts with
>>>         G1/G2) :)
>>>
>>>
>>>             If we remove 3 and RUAW getClobberingAccess(3) (=2) we get:
>>>
>>>               store @G1 ; 1 = MD(entry)
>>>               store @G2 ; 2 = MD(1)
>>>               load @G1 ; MU(2)
>>>               load @G2  ; MU(2)
>>>
>>>
>>>         I agree that if you can get memoryssa to generate the first,
>>>         you will get the second and it will be non-optimal.  If you
>>>         can, and it's legal to move or remove the store, that seems
>>>         like a super-rare case unless i'm missing something?
>>
>>         Just to reply to this one particular point for now, here's a
>>         case where this can happen because of EarlyCSE removing a
>>         store that is simply storing back the value just loaded from
>>         the same memory location:
>>
>>
>>         define void @test3(i32* %p) {
>>         entry:
>>         ; 1 = MemoryDef(liveOnEntry)
>>           store i32 1, i32* @G1
>>         ; 2 = MemoryDef(1)
>>           store i32 2, i32* @G2
>>         ; MemoryUse(2)
>>           %xp = load i32, i32* %p
>>         ; 3 = MemoryDef(2)
>>           store i32 %xp, i32* %p
>>         ; MemoryUse(3)
>>           %x1 = load i32, i32* @G1
>>         ; MemoryUse(3)
>>           %x2 = load i32, i32* @G2
>>         ; 4 = MemoryDef(3)
>>           call void @foo(i32 %x1, i32 %x2, i32 %xp)
>>           ret void
>>         }
>>
>>         Removing:   store i32 %xp, i32* %p
>>
>>         define void @test3(i32* %p) {
>>         entry:
>>         ; 1 = MemoryDef(liveOnEntry)
>>           store i32 1, i32* @G1
>>         ; 2 = MemoryDef(1)
>>           store i32 2, i32* @G2
>>         ; MemoryUse(2)
>>           %xp = load i32, i32* %p
>>         ; MemoryUse(2)
>>           %x1 = load i32, i32* @G1
>>         ; MemoryUse(2)
>>           %x2 = load i32, i32* @G2
>>         ; 4 = MemoryDef(2)
>>           call void @foo(i32 %x1, i32 %x2, i32 %xp)
>>           ret void
>>         }
>>
>
>         ... and to push this case a bit further, it seems like it
>         wouldn't be too hard to construct a case where removing this
>         store would necessitate looking down through MemoryPhis in
>         order to maintain the optimized-use invariant, possibly even
>         leading to more MemoryPhi optimization?
>
>>
>>>         If you have a testcase with a real example, that would be
>>>         super-interesting to me.
>>>         Because if we can't get it to generate precise info in most
>>>         if not all cases, it's  possibly not worth doing the work we
>>>         are doing to optimize uses.
>>>
>>>         I don't claim we make the perfect set of tradeoffs right
>>>         now. I know for the use cases i cared about, it makes an
>>>         IMHO good set of choices.
>>>
>>>
>>>
>>>             but the load @G1 would be more precise if it was MU(1)
>>>             (and the invariant that defining access == clobbering
>>>             access would be broken).  Is this just a
>>>             compile-time/precision trade-off? 
>>>
>>>
>>>         Kinda. The main problem is it affects everyone downstream
>>>         *right now* (again, we could change this at the cost of
>>>         having the walker mutate loads).
>>>
>>>         Imagine something upstream of you generates imprecise info
>>>         for loads.
>>>         You use getClobberingAccess to work around it, because you
>>>         want the best answer.
>>>
>>>         Assume everything preserves MemorySSA.
>>>         We still can't really cache this sanely with a side-cache:
>>>          either it gets blown away, or takes up too much space.
>>>         Caching the result of every load without rewriting is
>>>         another O(N) space factor (we did it before, and george had
>>>         to undo it :P).
>>>
>>>         So now you wasted a bunch of time looking up the real
>>>         clobbering result for the load.
>>>         So will the next pass, and the pass after that, and ...
>>>
>>>         Stores get pushed *upwards* a lot less than loads, so
>>>         calling getClobberingMemoryAccess on them is more rare, and
>>>         caching them is more sane, and "misses" are lower cost.
>>>
>>>         For loads, we can work around having to have a separate
>>>         cache by rewriting the defining access in the main walker as
>>>         a cache (and marking them as optimized with a single bit or
>>>         something, so we know not to waste time doing it again until
>>>         the defining access gets reset again), but then the walker
>>>         is mutating the MemorySSA, which is "not ideal" if we can
>>>         avoid it.
>>>
>>>         If we can't, such is life, i'm basically trying to kick this
>>>         can as far down the road as i can before we use that solution :)
>>>
>>>         But y'all are the ones doing a lot of the work at this
>>>         point, so let me know if you think we should stop trying to
>>>         maintain these invariants.
>>>
>>>             Maybe for that reason it makes more sense to let the
>>>             client decide if they want to do the simple RAUW
>>>             getClobberingAccess(Store) or optimize each use separately?
>>>
>>>
>>>         If we can make it come out optimal for everyone, we should
>>>         do that.
>>>         Otherwise, one client screws up another :)
>>>
>>>         If we can't, yeah, sure, then we start making tradeoffs.
>>>
>>>
>>
>>         -- 
>>         Geoff Berry
>>         Employee of Qualcomm Datacenter Technologies, Inc.
>>           Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
>>
>>
>>         _______________________________________________
>>         llvm-commits mailing list
>>         llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>>         http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>         <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>
>         -- 
>         Geoff Berry
>         Employee of Qualcomm Datacenter Technologies, Inc.
>           Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
>
-- 
Geoff Berry
Employee of Qualcomm Datacenter Technologies, Inc.
  Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160825/61512c58/attachment.html>