[PATCH] D22568: [DSE] Implement dead store elimination using MemorySSA (disabled by default).

Wed Jul 20 08:15:59 PDT 2016

dberlin added a comment.

Thanks for doing this!

Just as a meta-comment:

One of the advantages of MemorySSA is that it allows sparse analysis.  Now, for stores, because it's SSA and not SSI, you do end up having to worklist uses more often (for loads, algorithms are as easy as scalar SSA algorithms).

If you use MemorySSA as a faster memdep, and try to write iterative algorithms,  it may or may not be faster, because it's really built to be able to build ssa-like algorithms, and some of the "faster memdep" use cases don't overlap with this well.

That said, I understand you are trying to port this incrementally, so i restricted major algorithmic comments to where there are easy and significantly better ways to do a thing.

================
Comment at: lib/Transforms/Scalar/DeadStoreElimination.cpp:1114
@@ +1113,3 @@
+  // If we're storing the same value back to a pointer that we just loaded from,
+  // then the store can be removed.
+  if (LoadInst *DepLoad = dyn_cast<LoadInst>(SI->getValueOperand())) {
----------------
Interesting. MemorySSA looks at this the other way around

It guarantees that if you are *loading a pointer you just stored from*, MemoryUse->getDefiningAccess() will be the store you just loaded from..

(That is, it guarantees that load->getDefiningAccess() is the nearest dominating thing that *actually* aliases with the load)

There is one edge case where it would be a MemoryPhi that you could eliminate, but it's a real nonlocal edge case - when the MemoryPhi's operands are all really the same thing

```
if (a)
  1 = MemoryDef(liveOnEntry)
  store B, 5
else
  2 = MemoryDef(liveOnEntry)
  store B, 5
3 = MemoryPhi(1, 2)
MemoryUse(3)
load B

```
Not sure this case is worth optimizing. NewGVN acutally handles this already and would just replace the load uses with "5".

================
Comment at: lib/Transforms/Scalar/DeadStoreElimination.cpp:1117
@@ +1116,3 @@
+    if (SI->getPointerOperand() == DepLoad->getPointerOperand() &&
+        isRemovable(SI) && memoryIsNotModifiedBetween(DepLoad, SI, AA)) {
+
----------------
Because of the above guarantee, the memory state is guaranteed to be the same as this store if and only if getMemoryAccess(DepLoad)->getDefiningAccess() == getMemoryAccess(SI).

This is a constant time check instead of memoryIsNotModifiedBetween (which also can be done faster with MemorySSA, but ...)

================
Comment at: lib/Transforms/Scalar/DeadStoreElimination.cpp:1127
@@ +1126,3 @@
+
+  // Remove null stores into the calloc'ed objects
+  Constant *StoredConstant = dyn_cast<Constant>(SI->getValueOperand());
----------------
Doesn't GVN do this already?

================
Comment at: lib/Transforms/Scalar/DeadStoreElimination.cpp:1134
@@ +1133,3 @@
+    if (UnderlyingPointer && isCallocLikeFn(UnderlyingPointer, TLI) &&
+        memoryIsNotModifiedBetween(UnderlyingPointer, SI, AA)) {
+      DEBUG(
----------------
This whole block could be

  // For stores, getClobberingMemoryAccess will guarantee you get the nearest dominating def that actually aliases the store. It is not needed for loads (see the comments at the top of MemorySSA.h for why this is)
  MemoryAccess *MA = MSSA->getClobberingMemoryAccess(SI);
  if (MemoryDef *MD = dyn_cast<MemoryDef>(MA)) {
     if (isCallocLikeFn(MD->getMemoryInst()) && <pointers are the same>)
....

This is going to be a lot faster than the memoryisnotmodifiedcall.

================
Comment at: lib/Transforms/Scalar/DeadStoreElimination.cpp:1156
@@ +1155,3 @@
+
+  // Iterate over all local memory accesses.
+  auto *BlockAccesses = MSSA->getBlockAccesses(F->getParent());
----------------
I'm a bit unclear  what this is trying to do, so i can't definitely give advice, only guess.

If you are trying to see if the only use of a store is a call to free, the following applies:

Unlike loads, store uses/defs are not guaranteed to may-alias the store (doing so requires allowing multiple phi nodes in memoryssa).

If they were, this would simply be "Process instructions in reverse order, worklist the uses of the store, if all of them are in calls to free, eliminate the store".

However, because of this, you want to do:

Process instructions in reverse order:

  <worklist the initial uses of the MemoryDef for the store>
  while(worklist not empty) 
    Pull use off worklist.
    If use is free call, ignore it.
    If use is below free call (IE  MSSA->dominates(memoryaccess for free, memoryaccess for use)), ignore it.
    (there are partially dead cases this will ignore, but let's ignore that for now)
   Otherwise:
     If use is a MemoryUse, you cannot remove the store (we already eliminated all uses after the free call above, and your other so this must be between the store and the free call. Both your original code and this code presume that you have already eliminated loads of just stored pointers, etc, so they don't get in the way ).
     If use is a MemoryDef, and getClobberingMemoryAccess(use) == original store you have store over store to the same data (IE partial or full overwrite). Not sure what you want to do here.
     If use is a MemoryDef, and getClobberingMemoryAccess(use) != original store, put the uses of this MemoryDef on worklist and continue.
    If use is a MemoryPhi, put uses of this MemoryPhi on worklist and continue.

This seems complicated, but is still going to be faster than what you have :)

================
Comment at: lib/Transforms/Scalar/DeadStoreElimination.cpp:1242
@@ +1241,3 @@
+    Instruction *Inst = MD->getMemoryInst();
+    if (!Inst || !hasMemoryWrite(Inst, *TLI))
+      continue;
----------------
Errr, If it's a memoryDef, it must write to memory or otherwise affect memory ;)

================
Comment at: lib/Transforms/Scalar/DeadStoreElimination.cpp:1257
@@ +1256,3 @@
+
+    // FIXME: Non-local DSE would be fun. :)
+    MemorySSA::AccessList::const_reverse_iterator RI(I->getIterator());
----------------
FWIW: You don't need this loop at all.

Much like in the rewritten merged load store motion, you can figure out what it could possibly be eliminated in favor in by hashing, and then use local ordering to tell which is earlier/later.
This is true even where they are partial overwrites (that just gets accounted for in the hash).

Whether "memory is modified" or (you have a use in the way )is entirely contained in the def/use chains of memoryssa (and MSSA->dominates), you don't need to look at the inst stream.

Even if you don't take the hashing approach, i would approach this sparsely, by starting at the top of the block, looking at each MemoryAccess, and doing something depending on the uses/defs it has. Then move to the next MemoryAccess.  You should not have to visit everything more than once, even if you do it this way.

That said, i understand you are trying to incrementally port, so i won't force you to do this.
Just sayin, if you want it to be faster, ...

https://reviews.llvm.org/D22568