[PATCH] D13363: [DeadStoreElimination] Add support for non-local DSE

Tue Oct 27 17:11:25 PDT 2015

mbodart added a comment.

Hi Ivan,

I added a few more responses.

Please be patient as this is a non-trivial optimization, and for me at least it takes some time to poke around and verify things.
I'm still not confident enough in my understanding of alias analysis to know whether underlyingObjectsDoNotAlias is complete, for example.

Also, please try to address some of the comments from David regarding naming conventions.

regards,

- Mitch


================
Comment at: lib/Transforms/Scalar/DeadStoreElimination.cpp:572-583
@@ -582,24 +571,14 @@
 
-      // If we find a write that is a) removable (i.e., non-volatile), b) is
-      // completely obliterated by the store to 'Loc', and c) which we know that
-      // 'Inst' doesn't load from, then we can remove it.
-      if (isRemovable(DepWrite) &&
-          !isPossibleSelfRead(Inst, Loc, DepWrite, *TLI, *AA)) {
-        int64_t InstWriteOffset, DepWriteOffset;
-        OverwriteResult OR =
-            isOverwrite(Loc, DepLoc, DL, *TLI, DepWriteOffset, InstWriteOffset);
-        if (OR == OverwriteComplete) {
-          DEBUG(dbgs() << "DSE: Remove Dead Store:\n  DEAD: "
-                << *DepWrite << "\n  KILLER: " << *Inst << '\n');
-
-          // Delete the store and now-dead instructions that feed it.
-          DeleteDeadInstruction(DepWrite, *MD, *TLI);
-          ++NumFastStores;
-          MadeChange = true;
-
-          // DeleteDeadInstruction can delete the current instruction in loop
-          // cases, reset BBI.
-          BBI = Inst;
-          if (BBI != BB.begin())
-            --BBI;
+      while (InstDep.isDef() || InstDep.isClobber()) {
+        // Get the memory clobbered by the instruction we depend on.  MemDep will
+        // skip any instructions that 'Loc' clearly doesn't interact with.  If we
+        // end up depending on a may- or must-aliased load, then we can't optimize
+        // away the store and we bail out.  However, if we depend on on something
+        // that overwrites the memory location we *can* potentially optimize it.
+        //
+        // Find out what memory location the dependent instruction stores.
+        Instruction *DepWrite = InstDep.getInst();
+        MemoryLocation DepLoc = getLocForWrite(DepWrite, *AA);
+        // If we didn't get a useful location, or if it isn't a size, bail out.
+        if (!DepLoc.Ptr)
           break;
----------------
ivanbaev wrote:
> The new patch is more aggressive in searching for redundant stores. Let's defer your suggestion for a subsequent patch.
That's fine with me.

================
Comment at: lib/Transforms/Scalar/DeadStoreElimination.cpp:653
@@ +652,3 @@
+      }
+    } else if (InstDep.isNonLocal()) { // DSE across BB
+      if (++NumNonLocalAttempts < MaxNonLocalAttempts)
----------------
The self read issue has not been addressed.

================
Comment at: lib/Transforms/Scalar/DeadStoreElimination.cpp:654
@@ +653,3 @@
+    } else if (InstDep.isNonLocal()) { // DSE across BB
+      if (++NumNonLocalAttempts < MaxNonLocalAttempts)
+        MadeChange |= handleNonLocalDependency(Inst);
----------------
ivanbaev wrote:
> MemoryDependenceAnalysis has BlockScanLimit = 100 and NumResultsLimit = 100. With these and the new MaxNonLocalAttempts = 100 we have enough handles to control compile-time. 
I see.  Thanks for pointing that out.
Those seem reasonable as a starting point.
But as Bruno suggested, some actual compile time measurments are probably in order, just to confirm.

================
Comment at: lib/Transforms/Scalar/DeadStoreElimination.cpp:808
@@ +807,3 @@
+      continue;
+    if (!DT->isReachableFromEntry(Pred) || DT->dominates(BB, Pred))
+      continue;
----------------
I still don't understand why we need to check DT->dominates(BB, Pred) here.
I'm guessing you're trying to guard against an infinite loop, which of course
we need to do.  But I think that would be automatically handled by instead checking
for !SafeBlocks.count(Pred).  In addition to allowing the search to proceed further, this
would eliminate the need for the special check of Pred == BB, since BB must be in SafeBlocks.

================
Comment at: lib/Transforms/Scalar/DeadStoreElimination.cpp:814
@@ +813,3 @@
+      BasicBlock *Succ = *II;
+      if (Succ == BB || Succ == Pred)
+        continue;
----------------
The Succ == BB check is correct, but redundant as SafeBlocks.count(BB) must be non-zero. 
I think the code would read better if this and the next check were combined into one:

if (Succ != Pred && !SafeBlocks.count(Succ))

but would be OK with your current structure.
Either way, however, it would be useful to add a clarifying source comment to the effect:

// Pred is only known to be optimizeable if all of its successors are either safe or are Pred itself.

================
Comment at: lib/Transforms/Scalar/DeadStoreElimination.cpp:892
@@ +891,3 @@
+      // Don't remove a store within single-block loops;
+      // we need more analysis: e.g. looking for an interferring load
+      // above the store within the loop, etc.
----------------
ivanbaev wrote:
> The idea is to allow the search continue through simple loops, but not to remove stores within loops. 
Sorry, I'm not sure which comment you were responding to.
Phabricator for some reason seems to be placing some of my comments
above the lines of interest, and others below.

Regarding my "loop invariant" comment, I'm saying that the test for whether the current block
is a single-block loop (getNumSuccessors() == 2, etc) should probably be done outside of this loop.  I'm also not sure if checking explicitly for two successors is sufficient.  A program could conceivably have a switch statement with one case having a goto back to a label just prior to the switch.  I'm not familiar enough with LLVM optimizations to know whether that can lead to an indirect branch successor which is the block containing the indirect branch itself.  But it's theoretically possible.  My recommendation here is to use a successor iterator, to match the behavior present in FindSafePreds.


http://reviews.llvm.org/D13363