[PATCH] D19821: [EarlyCSE] Optionally use MemorySSA. NFC.

Tue Aug 23 11:17:13 PDT 2016


On 8/23/2016 1:47 PM, Daniel Berlin wrote:
>
>
> On Tue, Aug 23, 2016 at 10:17 AM, Geoff Berry <gberry at codeaurora.org 
> <mailto:gberry at codeaurora.org>> wrote:
>
>     gberry added inline comments.
>
>     ================
>     Comment at: lib/Transforms/Scalar/EarlyCSE.cpp:537
>     @@ +536,3 @@
>     +  MemoryAccess *LaterDef =
>     +      MSSA->getWalker()->getClobberingMemoryAccess(LaterInst);
>     +  return MSSA->dominates(LaterDef,
>     MSSA->getMemoryAccess(EarlierInst));
>     ----------------
>     dberlin wrote:
>     > gberry wrote:
>     > > dberlin wrote:
>     > > > 1. For loads, you don't have to ask for the clobbering
>     access. It's already optimized such that getDefiningAccess == the
>     clobbering access
>     > > >
>     > > > 2. For stores, not sure if you realize this, but
>     > > >
>     > > > given
>     > > > store q  (lets's call this a)
>     > > > x = load p
>     > > > store q (let's call this b)
>     > > >
>     > > >
>     > > > if you call getClobberingMemoryAccess on b, it will return a.
>     > > >
>     > > >
>     > > For 1., I was not clear on whether this holds true after store
>     removal.
>     > >
>     > > For 2., yeah I get this, I'm not sure what you're getting at
>     though.  The removal of this second store by EarlyCSE doesn't use
>     MemorySSA to check for intervening loads in this change.  It uses
>     the 'LastStore' tracking to know when a store made redundant by a
>     second store can be removed.
>     > 1. Updates have to make it hold after store removal :)
>     >
>     > The problem is that if we don't keep this invariant up to date,
>     it means everyone uses getClobberingAccess, which does a bunch of
>     work to discover the load already points to the same thing.
>     >
>     > Everyone doing that is much higher than the cost of one person
>     updating the dominating def.
>     >
>     > (there is one case were getClobberingAccess will give you a
>     better answer, and that is on cases where we gave up during use
>     optimization. I only have one testcase this occurs on.  We only
>     give up on optimizing a load if it's going to be super expensive,
>     and you probably do *not* want to try to get better answers in
>     that case).
>     >
>     > As for updating when you remove stores, you should simply be
>     able to replace any loads the store uses with
>     getClobberingAccess(store) using RAUW.
>     >
>     > Under the covers, removeMemoryAccess calls RAUW with the
>     DefiningAccess.
>     > We could change it to use getClobberingMemoryAccess for loads,
>     and DefiningAccess for stores.
>     >
>     > 2. ah, okay.
>     >
>     >
>     Okay, I get why just checking the defining access for loads is
>     better (we get to skip the AA check).
>
>     For stores, we may be able to do something faster than calling
>     getClobberingAccess(store).  We could instead walk up the store
>     defining access chain and stop if we get to a point that dominates
>     the earlier load or a clobbering write.  I'll have to time this to
>     see if it makes a difference.  I guess it will depend on what
>     percentage of the time the clobber cache has been thrown away.
>
>
> Yes.  You are basically designing a custom walker here, and that's great.
> If it matters, I would just make it a walker class and
>
>
>     As for updating when removing stores: it seems like doing RAUW
>     getClobberingAccess(store) is not optimal in some cases. 
>
>
> Just out of curiosity do you have a real example of the below?  We 
> work really hard to avoid generating that.
>
>     For example:
>
>       store @G1 ; 1 = MD(entry)
>       store @G2 ; 2 = MD(1)
>       store %p ; 3 = MD(2)
>       load @G1 ; MU(3)
>       load @G2  ; MU(3)
>
>
> These should not be MU(3), they should be MU(1) and MU(2), unless the 
> store is *actually* a clobber for them. If the store is actually a 
> clobber, i don't see how you could legally remove or move the store 
> (since it conflicts with G1/G2) :)
>
>
>     If we remove 3 and RUAW getClobberingAccess(3) (=2) we get:
>
>       store @G1 ; 1 = MD(entry)
>       store @G2 ; 2 = MD(1)
>       load @G1 ; MU(2)
>       load @G2  ; MU(2)
>
>
> I agree that if you can get memoryssa to generate the first, you will 
> get the second and it will be non-optimal.  If you can, and it's legal 
> to move or remove the store, that seems like a super-rare case unless 
> i'm missing something?

Just to reply to this one particular point for now, here's a case where 
this can happen because of EarlyCSE removing a store that is simply 
storing back the value just loaded from the same memory location:


define void @test3(i32* %p) {
entry:
; 1 = MemoryDef(liveOnEntry)
   store i32 1, i32* @G1
; 2 = MemoryDef(1)
   store i32 2, i32* @G2
; MemoryUse(2)
   %xp = load i32, i32* %p
; 3 = MemoryDef(2)
   store i32 %xp, i32* %p
; MemoryUse(3)
   %x1 = load i32, i32* @G1
; MemoryUse(3)
   %x2 = load i32, i32* @G2
; 4 = MemoryDef(3)
   call void @foo(i32 %x1, i32 %x2, i32 %xp)
   ret void
}

Removing:   store i32 %xp, i32* %p

define void @test3(i32* %p) {
entry:
; 1 = MemoryDef(liveOnEntry)
   store i32 1, i32* @G1
; 2 = MemoryDef(1)
   store i32 2, i32* @G2
; MemoryUse(2)
   %xp = load i32, i32* %p
; MemoryUse(2)
   %x1 = load i32, i32* @G1
; MemoryUse(2)
   %x2 = load i32, i32* @G2
; 4 = MemoryDef(2)
   call void @foo(i32 %x1, i32 %x2, i32 %xp)
   ret void
}


> If you have a testcase with a real example, that would be 
> super-interesting to me.
> Because if we can't get it to generate precise info in most if not all 
> cases, it's  possibly not worth doing the work we are doing to 
> optimize uses.
>
> I don't claim we make the perfect set of tradeoffs right now. I know 
> for the use cases i cared about, it makes an IMHO good set of choices.
>
>
>
>     but the load @G1 would be more precise if it was MU(1) (and the
>     invariant that defining access == clobbering access would be
>     broken).  Is this just a compile-time/precision trade-off? 
>
>
> Kinda. The main problem is it affects everyone downstream *right now* 
> (again, we could change this at the cost of having the walker mutate 
> loads).
>
> Imagine something upstream of you generates imprecise info for loads.
> You use getClobberingAccess to work around it, because you want the 
> best answer.
>
> Assume everything preserves MemorySSA.
> We still can't really cache this sanely with a side-cache:  either it 
> gets blown away, or takes up too much space.
> Caching the result of every load without rewriting is another O(N) 
> space factor (we did it before, and george had to undo it :P).
>
> So now you wasted a bunch of time looking up the real clobbering 
> result for the load.
> So will the next pass, and the pass after that, and ...
>
> Stores get pushed *upwards* a lot less than loads, so calling 
> getClobberingMemoryAccess on them is more rare, and caching them is 
> more sane, and "misses" are lower cost.
>
> For loads, we can work around having to have a separate cache by 
> rewriting the defining access in the main walker as a cache (and 
> marking them as optimized with a single bit or something, so we know 
> not to waste time doing it again until the defining access gets reset 
> again), but then the walker is mutating the MemorySSA, which is "not 
> ideal" if we can avoid it.
>
> If we can't, such is life, i'm basically trying to kick this can as 
> far down the road as i can before we use that solution :)
>
> But y'all are the ones doing a lot of the work at this point, so let 
> me know if you think we should stop trying to maintain these invariants.
>
>     Maybe for that reason it makes more sense to let the client decide
>     if they want to do the simple RAUW getClobberingAccess(Store) or
>     optimize each use separately?
>
>
> If we can make it come out optimal for everyone, we should do that.
> Otherwise, one client screws up another :)
>
> If we can't, yeah, sure, then we start making tradeoffs.
>
>

-- 
Geoff Berry
Employee of Qualcomm Datacenter Technologies, Inc.
  Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160823/5877add4/attachment.html>