[PATCH] D13363: [DeadStoreElimination] Add support for DSE across blocks

Ivan Baev via llvm-commits llvm-commits at lists.llvm.org
Sun Oct 18 19:50:23 PDT 2015


ivanbaev added a comment.

Thanks for suggestions/comments, Mitch. My responses are inlined below.

In general I think your approach of using a reverse CFG search for this optimization is appropriate. But as Erik points out it doesn't handle if-then-else.
That can be addressed relatively easily:

Let StartBB be the block with the original StoreInst.
Let SafeBlocks be the set of blocks for which we know all paths emanating from them will lead to StoreInst
with no intervening reads of the memory location of interest. Initially, add StartBB to SafeBlocks.

The criteria for adding a new predecessor P to your WorkList are that P is reachable, P is not in SafeBlocks (not visited),
and all of P's successors are P itself (the single-block loop case) or are in SafeBlocks.

Seed your WorkList with the predecessors of StartBB that meet this criteria.
After processing a block B in HandleNonLocalDependency, when DependencyFound is false,
add B to SafeBlocks, and augment the WorkList with B's predecessors that meet the criteria.

The "P is not in SafeBlocks" acts like a "! Visited" check, except that terminal (unsafe) blocks may be
visited multiple times. To avoid that redundancy we'd need another set (UnsafeBlocks, or perhaps just VisitedBlocks).

- I added set SafeBlocks to keep track of all predecessor blocks that are safe to search through.

It is seeded with BB, the block with the original store. If no Def/Clobber dependency is found in predecessor PB, then we add PB to SafeBlocks.
Helper FindSafePreds(Blocks, SafeBlocks, PB, DT) then uses SafeBlocks to add blocks with two successors (e.g. an if-block) to the worklist.

As to the compile time threshold, I think it needs to be applied to the number of dependences and blocks seen
in HandleNonLocalDependency. That way we'll put a cap on the cost of optimizing an individual store,
rather than limiting the number of store instructions that may get optimized. I don't know what LLVM's
philosophy is on adding new command line options, but perhaps the threshold value should be overridable
that way. And perhaps a cl::opt<bool> EnableGDSE("enable-gdse"...) to control this global dead store elimination.

As for the while loop in runOnBasicBlock, I would prefer that it keep track of whether any reads were seen,
including the isSelfRead case, and then we handle non-local dependences after that loop terminates.
This is the same concept as your DependencySeen tracking in HandleNonLocalDependency, though I might
name it SafeToProcessNonLocalDeps.

- Use of static const unsigned MaxNonLocalAttempts = 100; in DSE is similar to the use of static const unsigned MaxIVUsers = 200; in LoopStrengthReduce.

Note that the large portion of DSE compile-time is due not to proper DSE but to the underlying MemoryDependenceAnalysis, 
especially for computing the non-local dependency information. Controlling computation in MemoryDependenceAnalysis is 
outside the scope of DSE. Given the iterative way of adding predecessors and searching them locally for non-local DSE 
I don't expect a substantial increase in DSE compile-time.

lib/Transforms/Scalar/DeadStoreElimination.cpp, 857
Instead of InstPt, shouldn't the search start with PB->end()?
It looks like getPointerDependencyFrom does a "--" operation on this iterator
to get the first instruction to start its analysis from.

- Each BB ends with a terminator instruction and these should not have contain dependencies for the store.

See also  DSE::HandleFree().

line 900
I don't understand why we are stopping the search here.
If the overwrite is partial, there may still be preceding stores
that could be eliminated.

- In the local BB case, "if (OR == OverwriteComplete) ..." is followed by

"else if (OR == OverwriteEnd && isShortenable(DepWrite)) ..."
I agree that we can be more aggressive here but suggest this to be added by a separate patch.

line 906
Shouldn't this check also include && Dep.isNonLocal()?

- We know that the first dependence - which triggers invocation of HandleNonLocalDependency(Inst) - is non-local.

In HandleNonLocalDependency() we use MemoryDependenceAnalysis::getPointerDependencyFrom()
to performs local search: it walks backwards through the basic block, looking for dependencies.

In each iteration of: 
 while (!Blocks.empty()) {

    BasicBlock *PB = Blocks.pop_back_val();
    ...
    MemDepResult Dep = MD->getPointerDependencyFrom(Loc, false, InstPt, PB, SI);
    ...
  }

we are looking for (local) dependencies within PB for the original store instruction. A non-local dependency found 
when we search PB should not impact how we process PB.


http://reviews.llvm.org/D13363





More information about the llvm-commits mailing list