[PATCH] D19821: [EarlyCSE] Optionally use MemorySSA. NFC.
Geoff Berry via llvm-commits
llvm-commits at lists.llvm.org
Thu Aug 25 08:36:49 PDT 2016
Sounds good. I'll go ahead and check in what I have now (since it is
NFC) and we can proceed from there.
I'm currently trying out an approach where upon removing a store I
eagerly re-optimize phis, but just mark uses as dirty, then only
re-compute the clobber for dirty uses if they come up in a
isSameMemGeneration query or at the end of the pass, which is sort of a
hybrid of the two approaches you described.
On 8/24/2016 7:34 PM, Daniel Berlin wrote:
> (modulo this, btw, the code LGTM.I definitely don't think we should
> wait to resolve this before you commit it).
>
>
> On Tue, Aug 23, 2016 at 1:32 PM, Daniel Berlin <dberlin at dberlin.org
> <mailto:dberlin at dberlin.org>> wrote:
>
> Ugh.
> Okay, so.
> A couple thoughts:
>
> 1. We could maintain it by doing batch use opt at the end of each
> pass or something, and then using the faster use optimizer to
> reoptimize affected loads (it's easy to tell the set of possibly
> affected loads).
>
> 2. We could drop it. We would still build an initially optimized
> form, but whatever happens past that, happens. We would make
> getClobberingMemoryAccess in the default walker modify loads to
> store the optimized result on them, and add a bool to MemoryUse to
> track which loads may need to be reoptimized (that gets reset in
> setDefiningAccess unless you tell it not to).
>
> This will let you call getClobberingMemoryAccess on everything
> without it being really expensive for loads (right now it does a
> bunch of search/etc setup, and we need a way to avoid this)
>
> The downside to all of this is
>
> 1. The walker now modifies loads. Even if you are storing the
> defining accesses somewhere, and we change it (not sure why this
> would happen, but ...), it should be okay, it would just mean you
> have a valid but conservative answer.
>
> 2. It means a stores uses will no longer include all possible
> loads that die there. It will only include some, unless you run
> around and call getClobberingMemoryAccess on every load.
>
> I will have to do that in NewGVN to make load/store forwarding
> work sanely, but i'll deal with that.
>
>
> + a few others for thoughts.
>
>
> On Tue, Aug 23, 2016 at 1:04 PM, Geoff Berry
> <gberry at codeaurora.org <mailto:gberry at codeaurora.org>> wrote:
>
>
>
> On 8/23/2016 2:17 PM, Geoff Berry via llvm-commits wrote:
>>
>>
>>
>> On 8/23/2016 1:47 PM, Daniel Berlin wrote:
>>>
>>>
>>> On Tue, Aug 23, 2016 at 10:17 AM, Geoff Berry
>>> <gberry at codeaurora.org <mailto:gberry at codeaurora.org>> wrote:
>>>
>>> gberry added inline comments.
>>>
>>> ================
>>> Comment at: lib/Transforms/Scalar/EarlyCSE.cpp:537
>>> @@ +536,3 @@
>>> + MemoryAccess *LaterDef =
>>> + MSSA->getWalker()->getClobberingMemoryAccess(LaterInst);
>>> + return MSSA->dominates(LaterDef,
>>> MSSA->getMemoryAccess(EarlierInst));
>>> ----------------
>>> dberlin wrote:
>>> > gberry wrote:
>>> > > dberlin wrote:
>>> > > > 1. For loads, you don't have to ask for the
>>> clobbering access. It's already optimized such that
>>> getDefiningAccess == the clobbering access
>>> > > >
>>> > > > 2. For stores, not sure if you realize this, but
>>> > > >
>>> > > > given
>>> > > > store q (lets's call this a)
>>> > > > x = load p
>>> > > > store q (let's call this b)
>>> > > >
>>> > > >
>>> > > > if you call getClobberingMemoryAccess on b, it
>>> will return a.
>>> > > >
>>> > > >
>>> > > For 1., I was not clear on whether this holds true
>>> after store removal.
>>> > >
>>> > > For 2., yeah I get this, I'm not sure what you're
>>> getting at though. The removal of this second store by
>>> EarlyCSE doesn't use MemorySSA to check for intervening
>>> loads in this change. It uses the 'LastStore' tracking
>>> to know when a store made redundant by a second store
>>> can be removed.
>>> > 1. Updates have to make it hold after store removal :)
>>> >
>>> > The problem is that if we don't keep this invariant up
>>> to date, it means everyone uses getClobberingAccess,
>>> which does a bunch of work to discover the load already
>>> points to the same thing.
>>> >
>>> > Everyone doing that is much higher than the cost of
>>> one person updating the dominating def.
>>> >
>>> > (there is one case were getClobberingAccess will give
>>> you a better answer, and that is on cases where we gave
>>> up during use optimization. I only have one testcase
>>> this occurs on. We only give up on optimizing a load if
>>> it's going to be super expensive, and you probably do
>>> *not* want to try to get better answers in that case).
>>> >
>>> > As for updating when you remove stores, you should
>>> simply be able to replace any loads the store uses with
>>> getClobberingAccess(store) using RAUW.
>>> >
>>> > Under the covers, removeMemoryAccess calls RAUW with
>>> the DefiningAccess.
>>> > We could change it to use getClobberingMemoryAccess
>>> for loads, and DefiningAccess for stores.
>>> >
>>> > 2. ah, okay.
>>> >
>>> >
>>> Okay, I get why just checking the defining access for
>>> loads is better (we get to skip the AA check).
>>>
>>> For stores, we may be able to do something faster than
>>> calling getClobberingAccess(store). We could instead
>>> walk up the store defining access chain and stop if we
>>> get to a point that dominates the earlier load or a
>>> clobbering write. I'll have to time this to see if it
>>> makes a difference. I guess it will depend on what
>>> percentage of the time the clobber cache has been thrown
>>> away.
>>>
>>>
>>> Yes. You are basically designing a custom walker here, and
>>> that's great.
>>> If it matters, I would just make it a walker class and
>>>
>>>
>>> As for updating when removing stores: it seems like
>>> doing RAUW getClobberingAccess(store) is not optimal in
>>> some cases.
>>>
>>>
>>> Just out of curiosity do you have a real example of the
>>> below? We work really hard to avoid generating that.
>>>
>>> For example:
>>>
>>> store @G1 ; 1 = MD(entry)
>>> store @G2 ; 2 = MD(1)
>>> store %p ; 3 = MD(2)
>>> load @G1 ; MU(3)
>>> load @G2 ; MU(3)
>>>
>>>
>>> These should not be MU(3), they should be MU(1) and MU(2),
>>> unless the store is *actually* a clobber for them. If the
>>> store is actually a clobber, i don't see how you could
>>> legally remove or move the store (since it conflicts with
>>> G1/G2) :)
>>>
>>>
>>> If we remove 3 and RUAW getClobberingAccess(3) (=2) we get:
>>>
>>> store @G1 ; 1 = MD(entry)
>>> store @G2 ; 2 = MD(1)
>>> load @G1 ; MU(2)
>>> load @G2 ; MU(2)
>>>
>>>
>>> I agree that if you can get memoryssa to generate the first,
>>> you will get the second and it will be non-optimal. If you
>>> can, and it's legal to move or remove the store, that seems
>>> like a super-rare case unless i'm missing something?
>>
>> Just to reply to this one particular point for now, here's a
>> case where this can happen because of EarlyCSE removing a
>> store that is simply storing back the value just loaded from
>> the same memory location:
>>
>>
>> define void @test3(i32* %p) {
>> entry:
>> ; 1 = MemoryDef(liveOnEntry)
>> store i32 1, i32* @G1
>> ; 2 = MemoryDef(1)
>> store i32 2, i32* @G2
>> ; MemoryUse(2)
>> %xp = load i32, i32* %p
>> ; 3 = MemoryDef(2)
>> store i32 %xp, i32* %p
>> ; MemoryUse(3)
>> %x1 = load i32, i32* @G1
>> ; MemoryUse(3)
>> %x2 = load i32, i32* @G2
>> ; 4 = MemoryDef(3)
>> call void @foo(i32 %x1, i32 %x2, i32 %xp)
>> ret void
>> }
>>
>> Removing: store i32 %xp, i32* %p
>>
>> define void @test3(i32* %p) {
>> entry:
>> ; 1 = MemoryDef(liveOnEntry)
>> store i32 1, i32* @G1
>> ; 2 = MemoryDef(1)
>> store i32 2, i32* @G2
>> ; MemoryUse(2)
>> %xp = load i32, i32* %p
>> ; MemoryUse(2)
>> %x1 = load i32, i32* @G1
>> ; MemoryUse(2)
>> %x2 = load i32, i32* @G2
>> ; 4 = MemoryDef(2)
>> call void @foo(i32 %x1, i32 %x2, i32 %xp)
>> ret void
>> }
>>
>
> ... and to push this case a bit further, it seems like it
> wouldn't be too hard to construct a case where removing this
> store would necessitate looking down through MemoryPhis in
> order to maintain the optimized-use invariant, possibly even
> leading to more MemoryPhi optimization?
>
>>
>>> If you have a testcase with a real example, that would be
>>> super-interesting to me.
>>> Because if we can't get it to generate precise info in most
>>> if not all cases, it's possibly not worth doing the work we
>>> are doing to optimize uses.
>>>
>>> I don't claim we make the perfect set of tradeoffs right
>>> now. I know for the use cases i cared about, it makes an
>>> IMHO good set of choices.
>>>
>>>
>>>
>>> but the load @G1 would be more precise if it was MU(1)
>>> (and the invariant that defining access == clobbering
>>> access would be broken). Is this just a
>>> compile-time/precision trade-off?
>>>
>>>
>>> Kinda. The main problem is it affects everyone downstream
>>> *right now* (again, we could change this at the cost of
>>> having the walker mutate loads).
>>>
>>> Imagine something upstream of you generates imprecise info
>>> for loads.
>>> You use getClobberingAccess to work around it, because you
>>> want the best answer.
>>>
>>> Assume everything preserves MemorySSA.
>>> We still can't really cache this sanely with a side-cache:
>>> either it gets blown away, or takes up too much space.
>>> Caching the result of every load without rewriting is
>>> another O(N) space factor (we did it before, and george had
>>> to undo it :P).
>>>
>>> So now you wasted a bunch of time looking up the real
>>> clobbering result for the load.
>>> So will the next pass, and the pass after that, and ...
>>>
>>> Stores get pushed *upwards* a lot less than loads, so
>>> calling getClobberingMemoryAccess on them is more rare, and
>>> caching them is more sane, and "misses" are lower cost.
>>>
>>> For loads, we can work around having to have a separate
>>> cache by rewriting the defining access in the main walker as
>>> a cache (and marking them as optimized with a single bit or
>>> something, so we know not to waste time doing it again until
>>> the defining access gets reset again), but then the walker
>>> is mutating the MemorySSA, which is "not ideal" if we can
>>> avoid it.
>>>
>>> If we can't, such is life, i'm basically trying to kick this
>>> can as far down the road as i can before we use that solution :)
>>>
>>> But y'all are the ones doing a lot of the work at this
>>> point, so let me know if you think we should stop trying to
>>> maintain these invariants.
>>>
>>> Maybe for that reason it makes more sense to let the
>>> client decide if they want to do the simple RAUW
>>> getClobberingAccess(Store) or optimize each use separately?
>>>
>>>
>>> If we can make it come out optimal for everyone, we should
>>> do that.
>>> Otherwise, one client screws up another :)
>>>
>>> If we can't, yeah, sure, then we start making tradeoffs.
>>>
>>>
>>
>> --
>> Geoff Berry
>> Employee of Qualcomm Datacenter Technologies, Inc.
>> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>
> --
> Geoff Berry
> Employee of Qualcomm Datacenter Technologies, Inc.
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
>
--
Geoff Berry
Employee of Qualcomm Datacenter Technologies, Inc.
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160825/61512c58/attachment.html>
More information about the llvm-commits
mailing list