[PATCH] D8688: Update MergedLoadStoreMotion to use MemorySSA

Fri Jul 8 14:00:49 PDT 2016

gberry added a comment.

I'm personally fine with committing this as is since it's is disabled, though I don't feel I have the authority to approve it.

I do have a couple more comments/questions below, mainly to help me gain confidence that my understanding of MemorySSA is correct.

================
Comment at: lib/Transforms/Scalar/MergedLoadStoreMotion.cpp:799
@@ +798,3 @@
+
+  while (!Res && LookupIter != LookupResult.second) {
+    LoadInst *Load0 = LookupIter->first;
----------------
dberlin wrote:
> gberry wrote:
> > Isn't this still O(M^2) if e.g. all of the loads have the same clobbering access and the same types? I guess the limit above controls how big M can get though.
> If that was the case, all the loads in that block are equal (because they also passed isSameOperation, etc) and GVN should have removed them already :)
> 
> Note that neither the mssa version or the non-mssa version will handle the case of two identical loads in one side of the diamond, and one load in the other. It will only hoist/merge one of identical ones.
Is that right (the bit about all the loads being equal)?

I'm thinking of a case like the below where all of the loads pass isSameOperation and all have the call as their clobbering access.  In this case you'll end up comparing all pairs of loads (assuming none of the loads must alias).

call foo
br %c, %then, %else

then:
load i32, %p1
load i32, %p2
load i32, %p3
...
br %end

else:
load i32, %p4
load i32, %p5
load i32, %p6
...
br %end

end:

================
Comment at: lib/Transforms/Scalar/MergedLoadStoreMotion.cpp:1011
@@ +1010,3 @@
+                                             BBNode->getDFSNumOut()};
+    // First see if it's outside the dominator tree range
+    // The only way it could affect us is if it's below us in the dominator
----------------
Do you even need the domtree info here to check for barriers in a diamond?  There are references to hammocks, but as far as I can tell we never actually attempt to optimize hammocks.  If that is the case, couldn't this just check for barriers in the same block as Start?

================
Comment at: lib/Transforms/Scalar/MergedLoadStoreMotion.cpp:1052
@@ +1051,3 @@
+  //
+  // 1. There are no stores above S0 or S1 in either block.
+  // In this case, no phi node is necessary, both MemoryDefs  must have the same
----------------
Gerolf wrote:
> I continue struggling with this function. For now I just need some help understanding the comment.
> When you say "above" S0, S1 you refer to DFS numbering like Sx is above S0 in this example:
> 
> 0 Sx
> 1
> 2 S0 
> 
> Correct? This would be consistent with your DFS explanation in a previous explanation.
I think there is a bug in the case 1 scenario if there is a pre-existing phi in the bottom of the diamond that references just one of the sunk stores.
For example:

; 1 = MemDef ...
call foo
br %c %then, %else

then:
; 2 = MemDef(1)
store @A
; 3 = MemDef(2)
store @B
br %end

else:
; 4 = MemDef(1)
store @A
br %end

end:
5 = MemPhi(3, 4)

I believe in this case, after updating the end block will look like:

end:
5 = MemPhi(3, 6)
; 6 = MemDef(1)
store @A

Which seems wrong, since the phi is before it's use, but also it seems like having 6's defining access skip over the phi 5 and def 3 could cause trouble, though I'm less sure about the latter.

http://reviews.llvm.org/D8688