[PATCH] D8688: Update MergedLoadStoreMotion to use MemorySSA

Wed Jul 6 17:33:42 PDT 2016

dberlin marked an inline comment as done.
dberlin added a comment.

I'll be honest. I think we are going a bit overboard for something that is off by default and is meant to be an identical algorithm to an existing pass that we know can stand improvements.

While I am all for trying to get code to be as good as possible, i don't think we need to make it completely perfect on the first pass when it's easily subject to incremental improvements :)

IE i think there is some point that is "good enough" to start, particularly for things that we can improve before we flip them to be the default.

For all we know, changes we have to make to account for converting the other passes are going to also make us change code here anyway.


================
Comment at: lib/Transforms/Scalar/MergedLoadStoreMotion.cpp:93
@@ -91,1 +92,3 @@
+#include <unordered_set>
+#include <vector>
 
----------------
fixed

================
Comment at: lib/Transforms/Scalar/MergedLoadStoreMotion.cpp:158
@@ +157,3 @@
+  // successors.
+  DenseMap<const Instruction *, unsigned> DFSNumberMap;
+  DenseMap<const BasicBlock *, unsigned> FirstHoistBarrier;
----------------
gberry wrote:
> Can't you just compute these instruction indices for the diamond then/else blocks lazily?  The same goes for LastSinkBarrier, which wouldn't even need a map in that case.
Yes, you can compute them once on-demand for the blocks, but it takes nearly zero time right now. It did not seem worth it for a first pass to build something to do it :)
Remember this is supposed to be an NFC conversion to start, and building the barrier maps already converts what was N^2 before to N.

================
Comment at: lib/Transforms/Scalar/MergedLoadStoreMotion.cpp:771
@@ +770,3 @@
+  // This isUsedOutsideOfBlock is also conservative.
+  if (!Load1 || Load1->isUsedOutsideOfBlock(BB))
+    return false;
----------------
gberry wrote:
> Maybe check Load1->isSimple() here?  Maybe factor out all these checks into a function and use it for the Load0 hash table insertion as well?
Let us please not do that in this patch.
Doing that would change this algorithm to catch a different set of loads than the non-memoryssa version.
I would like to keep the algorithms identical in what they catch for the moment, so debugging is easy.

Eventually, when the mssa version is the default and the non-mssa version killed, we can improve it to be whatever we want.

================
Comment at: lib/Transforms/Scalar/MergedLoadStoreMotion.cpp:799
@@ +798,3 @@
+
+  while (!Res && LookupIter != LookupResult.second) {
+    LoadInst *Load0 = LookupIter->first;
----------------
gberry wrote:
> Isn't this still O(M^2) if e.g. all of the loads have the same clobbering access and the same types? I guess the limit above controls how big M can get though.
If that was the case, all the loads in that block are equal (because they also passed isSameOperation, etc) and GVN should have removed them already :)

Note that neither the mssa version or the non-mssa version will handle the case of two identical loads in one side of the diamond, and one load in the other. It will only hoist/merge one of identical ones.

================
Comment at: lib/Transforms/Scalar/MergedLoadStoreMotion.cpp:997
@@ +996,3 @@
+  // Seed with current users that are in the block range.
+  std::queue<MemoryAccess *> CurrentUses;
+  SmallPtrSet<const MemoryAccess *, 8> Visited;
----------------
gberry wrote:
> Is there a reason you're using a queue for this work-list instead of the more common SmallVector?
We push back and pull front because that exploration order guarantees we terminate at the earliest possible point.

If we popped off the back, it would be an order that would explore lower things before higher things.


http://reviews.llvm.org/D8688