[PATCH] D8688: Update MergedLoadStoreMotion to use MemorySSA

Fri Jun 24 22:41:22 PDT 2016

On Fri, Jun 24, 2016, 5:16 PM Gerolf Hoflehner <ghoflehner at apple.com> wrote:

> Gerolf added a comment.
>
> I'll take a more in depth look into the load and store merge routines
> also. At a first glance it seems  one could just add a few MSSA hooks
> rather than copy-paste-modify the code base.
>
It is possible, but the other version is o(n^3), the memory SSA version is
o(n), so this is deliberate. It is not possible to make the other version
less than n^2 without memory versions.

I also take a (late :-() look at the core MSSA design. Could you outline
> the high-level MSSA design in more detail than
> http://reviews.llvm.org/D7864?Eg.

There are also a bunch of mailing list threads and the comment in
memoryssa.h, but happy to expand on whatever for you :)

>how does it compare/relate to
http://www.airs.com/dnovillo/Papers/mem-ssa.pdf?  Thanks!

Sure.

This paper covers what was the first version of memoryssa for GCC.  Over
time, we significantly simplified the design after getting years of
experience with clients.   GCC's current design is now identical to the
llvm one.

Compared to the paper, in both implementations, there are no kills anymore,
and no partitioning. All loads and stores use and define a single memoryssa
variable instead of multiple ones representing heap partitions.   There is
a long discussion of why in one of the threads, but it boils down to: it's
not worth making it more precise for the cost you pay in time or memory or
reasoning or complexity of code, for what clients really ended up wanting
out of it.

This pass is a good example. Having multiple variables  would not make it
faster or more precise for load or store  merging  and may in fact make it
much slower.  For passes that care, the walker interface provides a way to
further try to disambiguate def def and def use chains.

> -Gerolf
>
>
> ================
> Comment at: lib/Transforms/Scalar/MergedLoadStoreMotion.cpp:346
> @@ -271,1 +345,3 @@
>    HoistedInst->insertBefore(HoistPt);
> +  if (MSSA) {
> +    // We also hoist operands of loads using this function, so check to
> see if
> ----------------
> Ok, it would probably not be possible foresee all use cases. I certainly
> agree to the iterative approach you propose. I'm surprised though that a
> simple replacement of a load has not come up elsewhere.
>

This is the first major pass conversion outside of earlycse. I am working
pass by pass to preserve it for gvn.

Note that this is also not simple load replacement, it's merging multiple
loads into a single one by insertion of a new load and removing.  However,
the operations performed are exactly what is happening to the non memoryssa
ir, which is that you are inserting a new load, and removing the two old
ones.

>
> ================
> Comment at: lib/Transforms/Scalar/MergedLoadStoreMotion.cpp:1032
> @@ +1031,3 @@
> +
> +void MergedLoadStoreMotion::updateMemorySSAForSunkStores(
> +    BasicBlock *SinkLocation, StoreInst *S0, StoreInst *S1, StoreInst
> *SNew) {
> ----------------
>
> What got me here is that I expected the update to be simple.

It would be if the stores were not may defs. Because they are, there may be
more of them below in the same block that you are not touching.

Ie.
1=memorydef(0)
Store a
2= memorydef(1)
Store b

When you move a, you must update b's defining access to 0

It is not possible to avoid this in all cases.   Even if you had multiple
memoryssa variables, there is no perfect partitioning of the heap that
guarantees you won't have a use.

If you removed the defining access of may defs, you would not easily know
how far a given memorydef version is valid for, or be able to tell the may
defs that might kill your store.  In the above design, finding the lowest
sink points  and uses  for a given store is easy, because you can walk the
use chains.    This makes a number of optimization algorithms that are
otherwise quadratic, linear.

Why is here more involved than replacing two MD's with one, possibly
> removing a MP?

In the simple case, where there are no stores below you, it is in fact,
that :)

The other complication is that a mp may already exist in the block below
you (imagine our store a, store b case in both sides of the diamond. A phi
will exist in the merge block and your insertion point is after the phi)

>From the clients perspective updates like this look too complicated to get
> right.

Fwiw: I disagree, but I admit I've written the memoryssa update routines
for a ton of  both gcc and llvm passes.  This is not materially different
from the ir update you have to perform if you sink stores and their
operands with pre.
This is pretty much as complicated as it gets as well, because it's code
motion, not just replacement with existing values.

> Maybe there is verifier in place, but in the code I don't even see a
> single assertion but could help debugging issues should something go wrong.

There is a verifier that is run in both the updating  unit tests and the
test cases for this pass that verify the updating is correct.
Past that, removal and insertion routines will assert if you do something
that is obviously wrong, much more than the ir does The verifier in both
cases is used to catch screwups that cost too much to verify at all times.
The verifier can be called at any time, and I'm happy to add a call to it
here under xdebug if you like.

Perhaps we can come up with some utility routines that make simpler to
> grasp: when two memory or more memory operation get moved, this is the
> blueprint for updating MSSA and guarantee it stays consistent.
>

I think we could in general, but probably not for this kind of specialized
code motion.

The simple cse case is already pretty simple (for loads, you do nothing but
remove the dead load. For stores , replacealluseswith your definingacess,
Remove the dead store).

Note that only stores are ever complex to reason about when updating.
Because loads are just uses, they never require much because doing things
do them can't change the reaching versions or require new phi nodes (unless
you sink loads instead of hoist them, because both memoryssa and the normal
ir are pruned ssa, so they eliminate phi nodes that have no uses. We could
just turn this back off if we wanted to guarantee no load sinking would
ever require a new phi node, but I have not hit a pass that wants to sink
loads)

For stores, general downward code motion can get complex enough that it may
be simpler and easier to not try to preserve the analysis.   This is true
of normal ssa as well.   This pass is pretty much the limit of what I'd try
to preserve rather than have it just recalculate memoryssa. The set of
update cases is fixed and formally provable.

But thoughts definitely welcome.

>
> ================
> Comment at: lib/Transforms/Scalar/MergedLoadStoreMotion.cpp:1099
> @@ +1098,3 @@
> +
> +  BasicBlock *MD0Block = MD0->getBlock();
> +  for (auto UI = MD0->use_begin(), UE = MD0->use_end(); UI != UE;) {
> ----------------
> There could be a lambda or function that takes care of the two loops..
>

> http://reviews.llvm.org/D8688
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160625/9a2743c9/attachment.html>