[PATCH] Update MergedLoadStoreMotion to use MemorySSA

Thu Apr 9 15:56:32 PDT 2015

On Thu, Apr 9, 2015 at 2:58 PM, Philip Reames <listmail at philipreames.com> wrote:
>
> On 04/09/2015 10:15 AM, Daniel Berlin wrote:
>>>
>>> During building, MemorySSA currently optimizes uses to the nearest
>>> dominating clobbering access for you, done at the suggestion of 2
>>> people.  We could make it stop doing this (it's just a flag), but the
>>> tradeoff is most things end up walking more in practice.
>>> Additionally, as things optimize, and you update memssa, you will
>>> often end up with the *same result* anyway.  The only way to stop that
>>> is to make it so you are required to use "nearest dominating
>>> memorydef" instead of "dominating memorydef" when updating.  We do
>>> this in the API's where we do insertion (addNewMemoryUse), but we
>>> don't check/verify it in replaceMemoryAccess style API's, we only
>>> check standard domination.
>
> Let me just say this back to you to make sure I understand what you're
> saying.
:)

>
> When looking at the def of a memory use, you are guaranteed to get a
> potentially clobbering def.  You may skip over provably non-aliasing writes.
Yes

>
> When looking at the users of a memory def, you may see things that are
> dominated by a non-aliasing write between the def and the use.
Yes

Given
store a
store b
load a

you have (currently, with use optimization)
1 = MemoryDef(liveonEntry)
store a

2 = MemoryDef(1)
store b

MemoryUse(1)
load a

immediate use lists will be:

store a -> load a
store b -> nothing

Note that we *only* allow this to happen for MemoryUse's. We do *not*
allow it to happen for the "use side" of a MemoryDef (which is not
really a use), because doing so may create multiple reaching
definitions for a single phi.

(see tests/Transforms/Util/memoryssa-no-disconnected.ll for an example)

> When updating, we require that these properties be preserved.
Yes

>
>> The other tradeoff, btw, is that while doing MemoryUse optimization
>> while building makes this particular algorithm slightly harder, it
>> makes an (IMHO) much more useful thing a lot easier.
>>
>> A lot of passes (GVN, memcpyopt, etc) want to know whether they can
>> replace a load with a load to eliminate something:
>>
>> 1= MemoryDef(liveonentry)
>> store a
>> 2 = MemoryDef(1)
>> store b
>> 3 = MemoryDef(2)
>> store c
>> 4 = MemoryDef(3)
>> store d
>> MemoryUse(1)
>> load a, 4 bytes
>> load a, 8 bytes
>> (or different types, or whatever)
>>
>> By optimizing MemoryUse's during building, we are guaranteed that if
>> we do getMemoryAccess(load A)->definition(which is store a)->uses, or
>> we now have all loads that actually use *that* store's value. This
>> means we can look at the other loads of that store value and see if we
>> can reuse them.
>
> Just to be clear, it's not the you know uses are must alias with the def,
> it's that you know the def is may alias with any use and there are no may
> alias writes between the def and the use right?

Correct.
It may also be must-alias, but that requires a later check.

>   I think that's what you
> were saying, just confirming.
>>
>>
>> Doing this without use optimization of uses would be much harder.
>> you'd need to use a downwards API from the store to get all possible
>> real uses, which may encompass walking the entire program.
>>
>> On the other hand, the tradeoff of doing MemoryUse optimization is
>> that it means to get all the loads in a block, you really have to ask
>> for all the loads in the block, instead of trying to hope you have
>> chains that give it to you :)
>
> This part now makes sense.
>
> However, can't you still do better for this use case?
>
> When looking at lifting loads, can't you look at the def of each load and
> then move that load immediately following the def?
Yes.
You can in fact, do a much better algorithm than this one.

You could do

1. Hoist of load operands, top down (and recursively) as high as possible.
2. Hoist of loads in forward order, to nearest-common-postdominator
(or something) of (all operand definitions, all upwards uses, all
clobbering writes)

In fact, discovering the highest placement points is the same thing as
computing ANTIC + safety for loads in PRE :)

The opposite with stores
1. you can sink all uses as low as possible
2.  sink all stores to nearest_common_dominator(all store uses, all
clobbering writes)

This assumes no phi additions.  If you are willing to do phi
insertion, you can also consider placement in other places (like a
branch).

In general, the whole store algorithm is a special case of very busy
expressions based hoisting on the reverse flowgraph.
The whole load algorithm is a special case of very busy expressions
based hoisting on the forward flowgraph

(I don't say PRE because standard PRE is LCM, and LCM would not move
these loads/stores from where they are)

>  If the def is in the
> common predeccessor block and we have a duplicate load in the other block,
> this accomplishes lifting.  (By definition, duplicated loads would share a
> def.)
>
> The key point here is that it feels odd to need to look at loads following
> the first (aliasing!) store in the block.  We might be able to merge two
> *stores* and then recurse, but if you can't move a clobbering store, you
> definitely can't move any of it's uses.
>
> And... I think I finally see your point.  The challenge is that you can't
> identify whether a load is clobbered by a given store without looking at
> it's def directly.  The mere presence of the store does not imply
> clobbering.  Gotcha.

Correct

>
> Just to make sure I'm following, it would be *legal* to look at all uses of
> the memory def obtained by following the edge from Load0 to identify
> candidates for Load1 right?
Yes
>   You are guaranteed that any legal candidate for
> merging must be in the use list since it would have to be clobbered by the
> same store.

Yes

>
> p.s. A comment explaining the equivalence sets that are defined for LoadInfo
> and StoreInfo would be useful.  In particular, the equality of the def being
> a way to short circuit the alias check is a useful but confusing trick.

Yes, i will add more comments

>
> Philip
>