[PATCH] D19821: [EarlyCSE] Optionally use MemorySSA. NFC.
Geoff Berry via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 23 13:04:55 PDT 2016
On 8/23/2016 2:17 PM, Geoff Berry via llvm-commits wrote:
>
>
>
> On 8/23/2016 1:47 PM, Daniel Berlin wrote:
>>
>>
>> On Tue, Aug 23, 2016 at 10:17 AM, Geoff Berry <gberry at codeaurora.org
>> <mailto:gberry at codeaurora.org>> wrote:
>>
>> gberry added inline comments.
>>
>> ================
>> Comment at: lib/Transforms/Scalar/EarlyCSE.cpp:537
>> @@ +536,3 @@
>> + MemoryAccess *LaterDef =
>> + MSSA->getWalker()->getClobberingMemoryAccess(LaterInst);
>> + return MSSA->dominates(LaterDef,
>> MSSA->getMemoryAccess(EarlierInst));
>> ----------------
>> dberlin wrote:
>> > gberry wrote:
>> > > dberlin wrote:
>> > > > 1. For loads, you don't have to ask for the clobbering
>> access. It's already optimized such that getDefiningAccess == the
>> clobbering access
>> > > >
>> > > > 2. For stores, not sure if you realize this, but
>> > > >
>> > > > given
>> > > > store q (lets's call this a)
>> > > > x = load p
>> > > > store q (let's call this b)
>> > > >
>> > > >
>> > > > if you call getClobberingMemoryAccess on b, it will return a.
>> > > >
>> > > >
>> > > For 1., I was not clear on whether this holds true after
>> store removal.
>> > >
>> > > For 2., yeah I get this, I'm not sure what you're getting at
>> though. The removal of this second store by EarlyCSE doesn't use
>> MemorySSA to check for intervening loads in this change. It uses
>> the 'LastStore' tracking to know when a store made redundant by a
>> second store can be removed.
>> > 1. Updates have to make it hold after store removal :)
>> >
>> > The problem is that if we don't keep this invariant up to date,
>> it means everyone uses getClobberingAccess, which does a bunch of
>> work to discover the load already points to the same thing.
>> >
>> > Everyone doing that is much higher than the cost of one person
>> updating the dominating def.
>> >
>> > (there is one case were getClobberingAccess will give you a
>> better answer, and that is on cases where we gave up during use
>> optimization. I only have one testcase this occurs on. We only
>> give up on optimizing a load if it's going to be super expensive,
>> and you probably do *not* want to try to get better answers in
>> that case).
>> >
>> > As for updating when you remove stores, you should simply be
>> able to replace any loads the store uses with
>> getClobberingAccess(store) using RAUW.
>> >
>> > Under the covers, removeMemoryAccess calls RAUW with the
>> DefiningAccess.
>> > We could change it to use getClobberingMemoryAccess for loads,
>> and DefiningAccess for stores.
>> >
>> > 2. ah, okay.
>> >
>> >
>> Okay, I get why just checking the defining access for loads is
>> better (we get to skip the AA check).
>>
>> For stores, we may be able to do something faster than calling
>> getClobberingAccess(store). We could instead walk up the store
>> defining access chain and stop if we get to a point that
>> dominates the earlier load or a clobbering write. I'll have to
>> time this to see if it makes a difference. I guess it will
>> depend on what percentage of the time the clobber cache has been
>> thrown away.
>>
>>
>> Yes. You are basically designing a custom walker here, and that's great.
>> If it matters, I would just make it a walker class and
>>
>>
>> As for updating when removing stores: it seems like doing RAUW
>> getClobberingAccess(store) is not optimal in some cases.
>>
>>
>> Just out of curiosity do you have a real example of the below? We
>> work really hard to avoid generating that.
>>
>> For example:
>>
>> store @G1 ; 1 = MD(entry)
>> store @G2 ; 2 = MD(1)
>> store %p ; 3 = MD(2)
>> load @G1 ; MU(3)
>> load @G2 ; MU(3)
>>
>>
>> These should not be MU(3), they should be MU(1) and MU(2), unless the
>> store is *actually* a clobber for them. If the store is actually a
>> clobber, i don't see how you could legally remove or move the store
>> (since it conflicts with G1/G2) :)
>>
>>
>> If we remove 3 and RUAW getClobberingAccess(3) (=2) we get:
>>
>> store @G1 ; 1 = MD(entry)
>> store @G2 ; 2 = MD(1)
>> load @G1 ; MU(2)
>> load @G2 ; MU(2)
>>
>>
>> I agree that if you can get memoryssa to generate the first, you will
>> get the second and it will be non-optimal. If you can, and it's
>> legal to move or remove the store, that seems like a super-rare case
>> unless i'm missing something?
>
> Just to reply to this one particular point for now, here's a case
> where this can happen because of EarlyCSE removing a store that is
> simply storing back the value just loaded from the same memory location:
>
>
> define void @test3(i32* %p) {
> entry:
> ; 1 = MemoryDef(liveOnEntry)
> store i32 1, i32* @G1
> ; 2 = MemoryDef(1)
> store i32 2, i32* @G2
> ; MemoryUse(2)
> %xp = load i32, i32* %p
> ; 3 = MemoryDef(2)
> store i32 %xp, i32* %p
> ; MemoryUse(3)
> %x1 = load i32, i32* @G1
> ; MemoryUse(3)
> %x2 = load i32, i32* @G2
> ; 4 = MemoryDef(3)
> call void @foo(i32 %x1, i32 %x2, i32 %xp)
> ret void
> }
>
> Removing: store i32 %xp, i32* %p
>
> define void @test3(i32* %p) {
> entry:
> ; 1 = MemoryDef(liveOnEntry)
> store i32 1, i32* @G1
> ; 2 = MemoryDef(1)
> store i32 2, i32* @G2
> ; MemoryUse(2)
> %xp = load i32, i32* %p
> ; MemoryUse(2)
> %x1 = load i32, i32* @G1
> ; MemoryUse(2)
> %x2 = load i32, i32* @G2
> ; 4 = MemoryDef(2)
> call void @foo(i32 %x1, i32 %x2, i32 %xp)
> ret void
> }
>
... and to push this case a bit further, it seems like it wouldn't be
too hard to construct a case where removing this store would necessitate
looking down through MemoryPhis in order to maintain the optimized-use
invariant, possibly even leading to more MemoryPhi optimization?
>
>> If you have a testcase with a real example, that would be
>> super-interesting to me.
>> Because if we can't get it to generate precise info in most if not
>> all cases, it's possibly not worth doing the work we are doing to
>> optimize uses.
>>
>> I don't claim we make the perfect set of tradeoffs right now. I know
>> for the use cases i cared about, it makes an IMHO good set of choices.
>>
>>
>>
>> but the load @G1 would be more precise if it was MU(1) (and the
>> invariant that defining access == clobbering access would be
>> broken). Is this just a compile-time/precision trade-off?
>>
>>
>> Kinda. The main problem is it affects everyone downstream *right now*
>> (again, we could change this at the cost of having the walker mutate
>> loads).
>>
>> Imagine something upstream of you generates imprecise info for loads.
>> You use getClobberingAccess to work around it, because you want the
>> best answer.
>>
>> Assume everything preserves MemorySSA.
>> We still can't really cache this sanely with a side-cache: either it
>> gets blown away, or takes up too much space.
>> Caching the result of every load without rewriting is another O(N)
>> space factor (we did it before, and george had to undo it :P).
>>
>> So now you wasted a bunch of time looking up the real clobbering
>> result for the load.
>> So will the next pass, and the pass after that, and ...
>>
>> Stores get pushed *upwards* a lot less than loads, so calling
>> getClobberingMemoryAccess on them is more rare, and caching them is
>> more sane, and "misses" are lower cost.
>>
>> For loads, we can work around having to have a separate cache by
>> rewriting the defining access in the main walker as a cache (and
>> marking them as optimized with a single bit or something, so we know
>> not to waste time doing it again until the defining access gets reset
>> again), but then the walker is mutating the MemorySSA, which is "not
>> ideal" if we can avoid it.
>>
>> If we can't, such is life, i'm basically trying to kick this can as
>> far down the road as i can before we use that solution :)
>>
>> But y'all are the ones doing a lot of the work at this point, so let
>> me know if you think we should stop trying to maintain these invariants.
>>
>> Maybe for that reason it makes more sense to let the client
>> decide if they want to do the simple RAUW
>> getClobberingAccess(Store) or optimize each use separately?
>>
>>
>> If we can make it come out optimal for everyone, we should do that.
>> Otherwise, one client screws up another :)
>>
>> If we can't, yeah, sure, then we start making tradeoffs.
>>
>>
>
> --
> Geoff Berry
> Employee of Qualcomm Datacenter Technologies, Inc.
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
--
Geoff Berry
Employee of Qualcomm Datacenter Technologies, Inc.
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160823/4f393f3f/attachment.html>
More information about the llvm-commits
mailing list