[PATCH] D8688: Update MergedLoadStoreMotion to use MemorySSA

Tue Jul 5 21:14:55 PDT 2016

> On Jun 24, 2016, at 10:41 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
> 
> 
> 
> On Fri, Jun 24, 2016, 5:16 PM Gerolf Hoflehner <ghoflehner at apple.com <mailto:ghoflehner at apple.com>> wrote:
> Gerolf added a comment.
> 
> I'll take a more in depth look into the load and store merge routines also. At a first glance it seems  one could just add a few MSSA hooks rather than copy-paste-modify the code base.
> It is possible, but the other version is o(n^3), the memory SSA version is o(n), so this is deliberate. It is not possible to make the other version less than n^2 without memory versions.
I think we should get both - the smaller functions in the original design and the o(n) behavior. But this can be done after the move to MSSA.
> 
> 
> 
> I also take a (late :-() look at the core MSSA design. Could you outline the high-level MSSA design in more detail than http://reviews.llvm.org/D7864?Eg <http://reviews.llvm.org/D7864?Eg>. 
> 
> There are also a bunch of mailing list threads and the comment in memoryssa.h, but happy to expand on whatever for you :)
> 
> 
> >how does it compare/relate to http://www.airs.com/dnovillo/Papers/mem-ssa.pdf <http://www.airs.com/dnovillo/Papers/mem-ssa.pdf>?  Thanks!
> 
> Sure.
> 
> 
> This paper covers what was the first version of memoryssa for GCC.  Over time, we significantly simplified the design after getting years of experience with clients.   GCC's current design is now identical to the llvm one.
> 
> Compared to the paper, in both implementations, there are no kills anymore, and no partitioning. All loads and stores use and define a single memoryssa variable instead of multiple ones representing heap partitions.   There is a long discussion of why in one of the threads, but it boils down to: it's not worth making it more precise for the cost you pay in time or memory or reasoning or complexity of code, for what clients really ended up wanting out of it.  
> 
> This pass is a good example. Having multiple variables  would not make it faster or more precise for load or store  merging  and may in fact make it much slower.  For passes that care, the walker interface provides a way to further try to disambiguate def def and def use chains.

OK, I see. Thanks!
> 
> 
> 
> 
> 
> 
> -Gerolf
> 
> 
> ================
> Comment at: lib/Transforms/Scalar/MergedLoadStoreMotion.cpp:346
> @@ -271,1 +345,3 @@
>    HoistedInst->insertBefore(HoistPt);
> +  if (MSSA) {
> +    // We also hoist operands of loads using this function, so check to see if
> ----------------
> Ok, it would probably not be possible foresee all use cases. I certainly agree to the iterative approach you propose. I'm surprised though that a simple replacement of a load has not come up elsewhere.
> 
> This is the first major pass conversion outside of earlycse. I am working pass by pass to preserve it for gvn.
> 
> Note that this is also not simple load replacement, it's merging multiple loads into a single one by insertion of a new load and removing.  However, the operations performed are exactly what is happening to the non memoryssa ir, which is that you are inserting a new load, and removing the two old ones.
> 
> ================
> Comment at: lib/Transforms/Scalar/MergedLoadStoreMotion.cpp:1032
> @@ +1031,3 @@
> +
> +void MergedLoadStoreMotion::updateMemorySSAForSunkStores(
> +    BasicBlock *SinkLocation, StoreInst *S0, StoreInst *S1, StoreInst *SNew) {
> ----------------
> 
> What got me here is that I expected the update to be simple.
> 
> It would be if the stores were not may defs. Because they are, there may be more of them below in the same block that you are not touching.
> 
> Ie.  
> 1=memorydef(0)
> Store a 
> 2= memorydef(1)
> Store b
> 
> When you move a, you must update b's defining access to 0
> 
> It is not possible to avoid this in all cases.   Even if you had multiple memoryssa variables, there is no perfect partitioning of the heap that guarantees you won't have a use.
> 
> If you removed the defining access of may defs, you would not easily know how far a given memorydef version is valid for, or be able to tell the may defs that might kill your store.  In the above design, finding the lowest sink points  and uses  for a given store is easy, because you can walk the use chains.    This makes a number of optimization algorithms that are otherwise quadratic, linear.
> 
> 
> 
> Why is here more involved than replacing two MD's with one, possibly removing a MP? 
> 
> In the simple case, where there are no stores below you, it is in fact, that :)
> 
> The other complication is that a mp may already exist in the block below you (imagine our store a, store b case in both sides of the diamond. A phi will exist in the merge block and your insertion point is after the phi)
> 
> From the clients perspective updates like this look too complicated to get right. 
> Fwiw: I disagree, but I admit I've written the memoryssa update routines for a ton of  both gcc and llvm passes.  This is not materially different from the ir update you have to perform if you sink stores and their operands with pre.
> This is pretty much as complicated as it gets as well, because it's code motion, not just replacement with existing values.
> Maybe there is verifier in place, but in the code I don't even see a single assertion but could help debugging issues should something go wrong.
> There is a verifier that is run in both the updating  unit tests and the test cases for this pass that verify the updating is correct.
> Past that, removal and insertion routines will assert if you do something that is obviously wrong, much more than the ir does The verifier in both cases is used to catch screwups that cost too much to verify at all times.
> The verifier can be called at any time, and I'm happy to add a call to it here under xdebug if you like.
> 
> Perhaps we can come up with some utility routines that make simpler to grasp: when two memory or more memory operation get moved, this is the blueprint for updating MSSA and guarantee it stays consistent.
> 
> I think we could in general, but probably not for this kind of specialized code motion.
> 
> The simple cse case is already pretty simple (for loads, you do nothing but remove the dead load. For stores , replacealluseswith your definingacess, Remove the dead store).
> 
> Note that only stores are ever complex to reason about when updating. Because loads are just uses, they never require much because doing things do them can't change the reaching versions or require new phi nodes (unless you sink loads instead of hoist them, because both memoryssa and the normal ir are pruned ssa, so they eliminate phi nodes that have no uses. We could just turn this back off if we wanted to guarantee no load sinking would ever require a new phi node, but I have not hit a pass that wants to sink loads)
> 
> For stores, general downward code motion can get complex enough that it may be simpler and easier to not try to preserve the analysis.   This is true of normal ssa as well.   This pass is pretty much the limit of what I'd try to preserve rather than have it just recalculate memoryssa. The set of update cases is fixed and formally provable.
> 
> But thoughts definitely welcome.
> 
> ================
> Comment at: lib/Transforms/Scalar/MergedLoadStoreMotion.cpp:1099
> @@ +1098,3 @@
> +
> +  BasicBlock *MD0Block = MD0->getBlock();
> +  for (auto UI = MD0->use_begin(), UE = MD0->use_end(); UI != UE;) {
> ----------------
> There could be a lambda or function that takes care of the two loops..
> 
> 
> 
> http://reviews.llvm.org/D8688 <http://reviews.llvm.org/D8688>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160705/14ff7961/attachment.html>