[PATCH] D19821: [EarlyCSE] Optionally use MemorySSA. NFC.

Tue Aug 23 10:47:20 PDT 2016

On Tue, Aug 23, 2016 at 10:17 AM, Geoff Berry <gberry at codeaurora.org> wrote:

> gberry added inline comments.
>
> ================
> Comment at: lib/Transforms/Scalar/EarlyCSE.cpp:537
> @@ +536,3 @@
> +  MemoryAccess *LaterDef =
> +      MSSA->getWalker()->getClobberingMemoryAccess(LaterInst);
> +  return MSSA->dominates(LaterDef, MSSA->getMemoryAccess(EarlierInst));
> ----------------
> dberlin wrote:
> > gberry wrote:
> > > dberlin wrote:
> > > > 1. For loads, you don't have to ask for the clobbering access. It's
> already optimized such that getDefiningAccess == the clobbering access
> > > >
> > > > 2. For stores, not sure if you realize this, but
> > > >
> > > > given
> > > > store q  (lets's call this a)
> > > > x = load p
> > > > store q (let's call this b)
> > > >
> > > >
> > > > if you call getClobberingMemoryAccess on b, it will return a.
> > > >
> > > >
> > > For 1., I was not clear on whether this holds true after store removal.
> > >
> > > For 2., yeah I get this, I'm not sure what you're getting at though.
> The removal of this second store by EarlyCSE doesn't use MemorySSA to check
> for intervening loads in this change.  It uses the 'LastStore' tracking to
> know when a store made redundant by a second store can be removed.
> > 1. Updates have to make it hold after store removal :)
> >
> > The problem is that if we don't keep this invariant up to date, it means
> everyone uses getClobberingAccess, which does a bunch of work to discover
> the load already points to the same thing.
> >
> > Everyone doing that is much higher than the cost of one person updating
> the dominating def.
> >
> > (there is one case were getClobberingAccess will give you a better
> answer, and that is on cases where we gave up during use optimization. I
> only have one testcase this occurs on.  We only give up on optimizing a
> load if it's going to be super expensive, and you probably do *not* want to
> try to get better answers in that case).
> >
> > As for updating when you remove stores, you should simply be able to
> replace any loads the store uses with getClobberingAccess(store) using RAUW.
> >
> > Under the covers, removeMemoryAccess calls RAUW with the DefiningAccess.
> > We could change it to use getClobberingMemoryAccess for loads, and
> DefiningAccess for stores.
> >
> > 2. ah, okay.
> >
> >
> Okay, I get why just checking the defining access for loads is better (we
> get to skip the AA check).
>

> For stores, we may be able to do something faster than calling
> getClobberingAccess(store).  We could instead walk up the store defining
> access chain and stop if we get to a point that dominates the earlier load
> or a clobbering write.  I'll have to time this to see if it makes a
> difference.  I guess it will depend on what percentage of the time the
> clobber cache has been thrown away.
>

Yes.  You are basically designing a custom walker here, and that's great.
If it matters, I would just make it a walker class and

>
> As for updating when removing stores: it seems like doing RAUW
> getClobberingAccess(store) is not optimal in some cases.

Just out of curiosity do you have a real example of the below?  We work
really hard to avoid generating that.

> For example:
>
>   store @G1 ; 1 = MD(entry)
>   store @G2 ; 2 = MD(1)
>   store %p ; 3 = MD(2)
>   load @G1 ; MU(3)
>   load @G2  ; MU(3)
>

These should not be MU(3), they should be MU(1) and MU(2), unless the store
is *actually* a clobber for them. If the store is actually a clobber, i
don't see how you could legally remove or move the store (since it
conflicts with G1/G2) :)

>
> If we remove 3 and RUAW getClobberingAccess(3) (=2) we get:
>
>   store @G1 ; 1 = MD(entry)
>   store @G2 ; 2 = MD(1)
>   load @G1 ; MU(2)
>   load @G2  ; MU(2)

I agree that if you can get memoryssa to generate the first, you will get
the second and it will be non-optimal.  If you can, and it's legal to move
or remove the store, that seems like a super-rare case unless i'm missing
something?

If you have a testcase with a real example, that would be super-interesting
to me.
Because if we can't get it to generate precise info in most if not all
cases, it's  possibly not worth doing the work we are doing to optimize
uses.

I don't claim we make the perfect set of tradeoffs right now. I know for
the use cases i cared about, it makes an IMHO good set of choices.

>
but the load @G1 would be more precise if it was MU(1) (and the invariant
> that defining access == clobbering access would be broken).  Is this just a
> compile-time/precision trade-off?

Kinda. The main problem is it affects everyone downstream *right now*
(again, we could change this at the cost of having the walker mutate loads).

Imagine something upstream of you generates imprecise info for loads.
You use getClobberingAccess to work around it, because you want the best
answer.

Assume everything preserves MemorySSA.
We still can't really cache this sanely with a side-cache:  either it gets
blown away, or takes up too much space.
Caching the result of every load without rewriting is another O(N) space
factor (we did it before, and george had to undo it :P).

So now you wasted a bunch of time looking up the real clobbering result for
the load.
So will the next pass, and the pass after that, and ...

Stores get pushed *upwards* a lot less than loads, so calling
getClobberingMemoryAccess on them is more rare, and caching them is more
sane, and "misses" are lower cost.

For loads, we can work around having to have a separate cache by rewriting
the defining access in the main walker as a cache (and marking them as
optimized with a single bit or something, so we know not to waste time
doing it again until the defining access gets reset again), but then the
walker is mutating the MemorySSA, which is "not ideal" if we can avoid it.

If we can't, such is life, i'm basically trying to kick this can as far
down the road as i can before we use that solution :)

But y'all are the ones doing a lot of the work at this point, so let me
know if you think we should stop trying to maintain these invariants.

> Maybe for that reason it makes more sense to let the client decide if they
> want to do the simple RAUW getClobberingAccess(Store) or optimize each use
> separately?
>
>
If we can make it come out optimal for everyone, we should do that.
Otherwise, one client screws up another :)

If we can't, yeah, sure, then we start making tradeoffs.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160823/8742a31c/attachment.html>