[PATCH] D19821: [EarlyCSE] Optionally use MemorySSA. NFC.
Geoff Berry via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 23 11:17:13 PDT 2016
On 8/23/2016 1:47 PM, Daniel Berlin wrote:
>
>
> On Tue, Aug 23, 2016 at 10:17 AM, Geoff Berry <gberry at codeaurora.org
> <mailto:gberry at codeaurora.org>> wrote:
>
> gberry added inline comments.
>
> ================
> Comment at: lib/Transforms/Scalar/EarlyCSE.cpp:537
> @@ +536,3 @@
> + MemoryAccess *LaterDef =
> + MSSA->getWalker()->getClobberingMemoryAccess(LaterInst);
> + return MSSA->dominates(LaterDef,
> MSSA->getMemoryAccess(EarlierInst));
> ----------------
> dberlin wrote:
> > gberry wrote:
> > > dberlin wrote:
> > > > 1. For loads, you don't have to ask for the clobbering
> access. It's already optimized such that getDefiningAccess == the
> clobbering access
> > > >
> > > > 2. For stores, not sure if you realize this, but
> > > >
> > > > given
> > > > store q (lets's call this a)
> > > > x = load p
> > > > store q (let's call this b)
> > > >
> > > >
> > > > if you call getClobberingMemoryAccess on b, it will return a.
> > > >
> > > >
> > > For 1., I was not clear on whether this holds true after store
> removal.
> > >
> > > For 2., yeah I get this, I'm not sure what you're getting at
> though. The removal of this second store by EarlyCSE doesn't use
> MemorySSA to check for intervening loads in this change. It uses
> the 'LastStore' tracking to know when a store made redundant by a
> second store can be removed.
> > 1. Updates have to make it hold after store removal :)
> >
> > The problem is that if we don't keep this invariant up to date,
> it means everyone uses getClobberingAccess, which does a bunch of
> work to discover the load already points to the same thing.
> >
> > Everyone doing that is much higher than the cost of one person
> updating the dominating def.
> >
> > (there is one case were getClobberingAccess will give you a
> better answer, and that is on cases where we gave up during use
> optimization. I only have one testcase this occurs on. We only
> give up on optimizing a load if it's going to be super expensive,
> and you probably do *not* want to try to get better answers in
> that case).
> >
> > As for updating when you remove stores, you should simply be
> able to replace any loads the store uses with
> getClobberingAccess(store) using RAUW.
> >
> > Under the covers, removeMemoryAccess calls RAUW with the
> DefiningAccess.
> > We could change it to use getClobberingMemoryAccess for loads,
> and DefiningAccess for stores.
> >
> > 2. ah, okay.
> >
> >
> Okay, I get why just checking the defining access for loads is
> better (we get to skip the AA check).
>
> For stores, we may be able to do something faster than calling
> getClobberingAccess(store). We could instead walk up the store
> defining access chain and stop if we get to a point that dominates
> the earlier load or a clobbering write. I'll have to time this to
> see if it makes a difference. I guess it will depend on what
> percentage of the time the clobber cache has been thrown away.
>
>
> Yes. You are basically designing a custom walker here, and that's great.
> If it matters, I would just make it a walker class and
>
>
> As for updating when removing stores: it seems like doing RAUW
> getClobberingAccess(store) is not optimal in some cases.
>
>
> Just out of curiosity do you have a real example of the below? We
> work really hard to avoid generating that.
>
> For example:
>
> store @G1 ; 1 = MD(entry)
> store @G2 ; 2 = MD(1)
> store %p ; 3 = MD(2)
> load @G1 ; MU(3)
> load @G2 ; MU(3)
>
>
> These should not be MU(3), they should be MU(1) and MU(2), unless the
> store is *actually* a clobber for them. If the store is actually a
> clobber, i don't see how you could legally remove or move the store
> (since it conflicts with G1/G2) :)
>
>
> If we remove 3 and RUAW getClobberingAccess(3) (=2) we get:
>
> store @G1 ; 1 = MD(entry)
> store @G2 ; 2 = MD(1)
> load @G1 ; MU(2)
> load @G2 ; MU(2)
>
>
> I agree that if you can get memoryssa to generate the first, you will
> get the second and it will be non-optimal. If you can, and it's legal
> to move or remove the store, that seems like a super-rare case unless
> i'm missing something?
Just to reply to this one particular point for now, here's a case where
this can happen because of EarlyCSE removing a store that is simply
storing back the value just loaded from the same memory location:
define void @test3(i32* %p) {
entry:
; 1 = MemoryDef(liveOnEntry)
store i32 1, i32* @G1
; 2 = MemoryDef(1)
store i32 2, i32* @G2
; MemoryUse(2)
%xp = load i32, i32* %p
; 3 = MemoryDef(2)
store i32 %xp, i32* %p
; MemoryUse(3)
%x1 = load i32, i32* @G1
; MemoryUse(3)
%x2 = load i32, i32* @G2
; 4 = MemoryDef(3)
call void @foo(i32 %x1, i32 %x2, i32 %xp)
ret void
}
Removing: store i32 %xp, i32* %p
define void @test3(i32* %p) {
entry:
; 1 = MemoryDef(liveOnEntry)
store i32 1, i32* @G1
; 2 = MemoryDef(1)
store i32 2, i32* @G2
; MemoryUse(2)
%xp = load i32, i32* %p
; MemoryUse(2)
%x1 = load i32, i32* @G1
; MemoryUse(2)
%x2 = load i32, i32* @G2
; 4 = MemoryDef(2)
call void @foo(i32 %x1, i32 %x2, i32 %xp)
ret void
}
> If you have a testcase with a real example, that would be
> super-interesting to me.
> Because if we can't get it to generate precise info in most if not all
> cases, it's possibly not worth doing the work we are doing to
> optimize uses.
>
> I don't claim we make the perfect set of tradeoffs right now. I know
> for the use cases i cared about, it makes an IMHO good set of choices.
>
>
>
> but the load @G1 would be more precise if it was MU(1) (and the
> invariant that defining access == clobbering access would be
> broken). Is this just a compile-time/precision trade-off?
>
>
> Kinda. The main problem is it affects everyone downstream *right now*
> (again, we could change this at the cost of having the walker mutate
> loads).
>
> Imagine something upstream of you generates imprecise info for loads.
> You use getClobberingAccess to work around it, because you want the
> best answer.
>
> Assume everything preserves MemorySSA.
> We still can't really cache this sanely with a side-cache: either it
> gets blown away, or takes up too much space.
> Caching the result of every load without rewriting is another O(N)
> space factor (we did it before, and george had to undo it :P).
>
> So now you wasted a bunch of time looking up the real clobbering
> result for the load.
> So will the next pass, and the pass after that, and ...
>
> Stores get pushed *upwards* a lot less than loads, so calling
> getClobberingMemoryAccess on them is more rare, and caching them is
> more sane, and "misses" are lower cost.
>
> For loads, we can work around having to have a separate cache by
> rewriting the defining access in the main walker as a cache (and
> marking them as optimized with a single bit or something, so we know
> not to waste time doing it again until the defining access gets reset
> again), but then the walker is mutating the MemorySSA, which is "not
> ideal" if we can avoid it.
>
> If we can't, such is life, i'm basically trying to kick this can as
> far down the road as i can before we use that solution :)
>
> But y'all are the ones doing a lot of the work at this point, so let
> me know if you think we should stop trying to maintain these invariants.
>
> Maybe for that reason it makes more sense to let the client decide
> if they want to do the simple RAUW getClobberingAccess(Store) or
> optimize each use separately?
>
>
> If we can make it come out optimal for everyone, we should do that.
> Otherwise, one client screws up another :)
>
> If we can't, yeah, sure, then we start making tradeoffs.
>
>
--
Geoff Berry
Employee of Qualcomm Datacenter Technologies, Inc.
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160823/5877add4/attachment.html>
More information about the llvm-commits
mailing list