[PATCH] D19821: [EarlyCSE] Optionally use MemorySSA. NFC.

Tue Aug 23 13:04:55 PDT 2016

On 8/23/2016 2:17 PM, Geoff Berry via llvm-commits wrote:
>
>
>
> On 8/23/2016 1:47 PM, Daniel Berlin wrote:
>>
>>
>> On Tue, Aug 23, 2016 at 10:17 AM, Geoff Berry <gberry at codeaurora.org 
>> <mailto:gberry at codeaurora.org>> wrote:
>>
>>     gberry added inline comments.
>>
>>     ================
>>     Comment at: lib/Transforms/Scalar/EarlyCSE.cpp:537
>>     @@ +536,3 @@
>>     +  MemoryAccess *LaterDef =
>>     +      MSSA->getWalker()->getClobberingMemoryAccess(LaterInst);
>>     +  return MSSA->dominates(LaterDef,
>>     MSSA->getMemoryAccess(EarlierInst));
>>     ----------------
>>     dberlin wrote:
>>     > gberry wrote:
>>     > > dberlin wrote:
>>     > > > 1. For loads, you don't have to ask for the clobbering
>>     access. It's already optimized such that getDefiningAccess == the
>>     clobbering access
>>     > > >
>>     > > > 2. For stores, not sure if you realize this, but
>>     > > >
>>     > > > given
>>     > > > store q  (lets's call this a)
>>     > > > x = load p
>>     > > > store q (let's call this b)
>>     > > >
>>     > > >
>>     > > > if you call getClobberingMemoryAccess on b, it will return a.
>>     > > >
>>     > > >
>>     > > For 1., I was not clear on whether this holds true after
>>     store removal.
>>     > >
>>     > > For 2., yeah I get this, I'm not sure what you're getting at
>>     though.  The removal of this second store by EarlyCSE doesn't use
>>     MemorySSA to check for intervening loads in this change.  It uses
>>     the 'LastStore' tracking to know when a store made redundant by a
>>     second store can be removed.
>>     > 1. Updates have to make it hold after store removal :)
>>     >
>>     > The problem is that if we don't keep this invariant up to date,
>>     it means everyone uses getClobberingAccess, which does a bunch of
>>     work to discover the load already points to the same thing.
>>     >
>>     > Everyone doing that is much higher than the cost of one person
>>     updating the dominating def.
>>     >
>>     > (there is one case were getClobberingAccess will give you a
>>     better answer, and that is on cases where we gave up during use
>>     optimization. I only have one testcase this occurs on.  We only
>>     give up on optimizing a load if it's going to be super expensive,
>>     and you probably do *not* want to try to get better answers in
>>     that case).
>>     >
>>     > As for updating when you remove stores, you should simply be
>>     able to replace any loads the store uses with
>>     getClobberingAccess(store) using RAUW.
>>     >
>>     > Under the covers, removeMemoryAccess calls RAUW with the
>>     DefiningAccess.
>>     > We could change it to use getClobberingMemoryAccess for loads,
>>     and DefiningAccess for stores.
>>     >
>>     > 2. ah, okay.
>>     >
>>     >
>>     Okay, I get why just checking the defining access for loads is
>>     better (we get to skip the AA check).
>>
>>     For stores, we may be able to do something faster than calling
>>     getClobberingAccess(store).  We could instead walk up the store
>>     defining access chain and stop if we get to a point that
>>     dominates the earlier load or a clobbering write.  I'll have to
>>     time this to see if it makes a difference.  I guess it will
>>     depend on what percentage of the time the clobber cache has been
>>     thrown away.
>>
>>
>> Yes.  You are basically designing a custom walker here, and that's great.
>> If it matters, I would just make it a walker class and
>>
>>
>>     As for updating when removing stores: it seems like doing RAUW
>>     getClobberingAccess(store) is not optimal in some cases. 
>>
>>
>> Just out of curiosity do you have a real example of the below?  We 
>> work really hard to avoid generating that.
>>
>>     For example:
>>
>>       store @G1 ; 1 = MD(entry)
>>       store @G2 ; 2 = MD(1)
>>       store %p ; 3 = MD(2)
>>       load @G1 ; MU(3)
>>       load @G2  ; MU(3)
>>
>>
>> These should not be MU(3), they should be MU(1) and MU(2), unless the 
>> store is *actually* a clobber for them. If the store is actually a 
>> clobber, i don't see how you could legally remove or move the store 
>> (since it conflicts with G1/G2) :)
>>
>>
>>     If we remove 3 and RUAW getClobberingAccess(3) (=2) we get:
>>
>>       store @G1 ; 1 = MD(entry)
>>       store @G2 ; 2 = MD(1)
>>       load @G1 ; MU(2)
>>       load @G2  ; MU(2)
>>
>>
>> I agree that if you can get memoryssa to generate the first, you will 
>> get the second and it will be non-optimal.  If you can, and it's 
>> legal to move or remove the store, that seems like a super-rare case 
>> unless i'm missing something?
>
> Just to reply to this one particular point for now, here's a case 
> where this can happen because of EarlyCSE removing a store that is 
> simply storing back the value just loaded from the same memory location:
>
>
> define void @test3(i32* %p) {
> entry:
> ; 1 = MemoryDef(liveOnEntry)
>   store i32 1, i32* @G1
> ; 2 = MemoryDef(1)
>   store i32 2, i32* @G2
> ; MemoryUse(2)
>   %xp = load i32, i32* %p
> ; 3 = MemoryDef(2)
>   store i32 %xp, i32* %p
> ; MemoryUse(3)
>   %x1 = load i32, i32* @G1
> ; MemoryUse(3)
>   %x2 = load i32, i32* @G2
> ; 4 = MemoryDef(3)
>   call void @foo(i32 %x1, i32 %x2, i32 %xp)
>   ret void
> }
>
> Removing:   store i32 %xp, i32* %p
>
> define void @test3(i32* %p) {
> entry:
> ; 1 = MemoryDef(liveOnEntry)
>   store i32 1, i32* @G1
> ; 2 = MemoryDef(1)
>   store i32 2, i32* @G2
> ; MemoryUse(2)
>   %xp = load i32, i32* %p
> ; MemoryUse(2)
>   %x1 = load i32, i32* @G1
> ; MemoryUse(2)
>   %x2 = load i32, i32* @G2
> ; 4 = MemoryDef(2)
>   call void @foo(i32 %x1, i32 %x2, i32 %xp)
>   ret void
> }
>

... and to push this case a bit further, it seems like it wouldn't be 
too hard to construct a case where removing this store would necessitate 
looking down through MemoryPhis in order to maintain the optimized-use 
invariant, possibly even leading to more MemoryPhi optimization?

>
>> If you have a testcase with a real example, that would be 
>> super-interesting to me.
>> Because if we can't get it to generate precise info in most if not 
>> all cases, it's  possibly not worth doing the work we are doing to 
>> optimize uses.
>>
>> I don't claim we make the perfect set of tradeoffs right now. I know 
>> for the use cases i cared about, it makes an IMHO good set of choices.
>>
>>
>>
>>     but the load @G1 would be more precise if it was MU(1) (and the
>>     invariant that defining access == clobbering access would be
>>     broken).  Is this just a compile-time/precision trade-off? 
>>
>>
>> Kinda. The main problem is it affects everyone downstream *right now* 
>> (again, we could change this at the cost of having the walker mutate 
>> loads).
>>
>> Imagine something upstream of you generates imprecise info for loads.
>> You use getClobberingAccess to work around it, because you want the 
>> best answer.
>>
>> Assume everything preserves MemorySSA.
>> We still can't really cache this sanely with a side-cache:  either it 
>> gets blown away, or takes up too much space.
>> Caching the result of every load without rewriting is another O(N) 
>> space factor (we did it before, and george had to undo it :P).
>>
>> So now you wasted a bunch of time looking up the real clobbering 
>> result for the load.
>> So will the next pass, and the pass after that, and ...
>>
>> Stores get pushed *upwards* a lot less than loads, so calling 
>> getClobberingMemoryAccess on them is more rare, and caching them is 
>> more sane, and "misses" are lower cost.
>>
>> For loads, we can work around having to have a separate cache by 
>> rewriting the defining access in the main walker as a cache (and 
>> marking them as optimized with a single bit or something, so we know 
>> not to waste time doing it again until the defining access gets reset 
>> again), but then the walker is mutating the MemorySSA, which is "not 
>> ideal" if we can avoid it.
>>
>> If we can't, such is life, i'm basically trying to kick this can as 
>> far down the road as i can before we use that solution :)
>>
>> But y'all are the ones doing a lot of the work at this point, so let 
>> me know if you think we should stop trying to maintain these invariants.
>>
>>     Maybe for that reason it makes more sense to let the client
>>     decide if they want to do the simple RAUW
>>     getClobberingAccess(Store) or optimize each use separately?
>>
>>
>> If we can make it come out optimal for everyone, we should do that.
>> Otherwise, one client screws up another :)
>>
>> If we can't, yeah, sure, then we start making tradeoffs.
>>
>>
>
> -- 
> Geoff Berry
> Employee of Qualcomm Datacenter Technologies, Inc.
>   Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits

-- 
Geoff Berry
Employee of Qualcomm Datacenter Technologies, Inc.
  Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160823/4f393f3f/attachment.html>