[PATCH] D13363: [DeadStoreElimination] Add support for non-local DSE

Wed Feb 3 19:22:25 PST 2016

ivanbaev added a comment.

Hi Mitch,

Thanks for working on this.

In http://reviews.llvm.org/D13363#334021, @mbodart wrote:

> I have a resolution for the issue causing bot timeouts,
>  as well as one unrelated fix for irreducible control flow.
>  Fortunately both fixes are trivial.  Details below.
>
> Chad or Ivan, how do you want to proceed?
>  Can you apply these fixes and re-land the change?

Feel free to proceed as Chad suggested.

> I used Rafael's llvm-ar.ll reproducer at

>  http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151207/318782.html

>  to chase the bot timeouts.

> 

> The issue is in findSafePreds, which was getting tripped up by a switch

>  statement, and inadvertently adding the same block to the worklist

>  multiple times.  The predecessor iterator needs to avoid processing

>  a given predecessor more than once.  Keeping track with a "PredsVisited"

>  set resolves that, and there is no longer any noticeable compile time

>  difference w/wo nonlocal DSE for llvm-ar.ll.

Is it something like:

SmallPtrSet<BasicBlock *, 8> Visited;
for (auto I = pred_begin(BB), E = pred_end(BB); I != E; ++I) {

  if (!Visited.insert(*I).second)
    continue;

> Wrt irreducibility, I was able to construct a test case for which the

>  CFG search goes into a cycle and revisits the original safe block

>  that instigated the nonlocal DSE search, SI->getParent() in

>  DSE::handleNonLocalDependency().  My earlier suggestion for the search

>  routine had an explicit check for !SafeBlocks.count(P) to avoid this

>  situation, but that was elided in favor of the dominator check

>  DT->dominates(BB, Pred).  With irreducible control flow we can have

>  a case where Pred is the original safe block, but BB does not dominate it.

> 

> One fix here is to pass in that original block to findSafePreds,

>  and replace the dominator check with DT->dominates(OrigBB, Pred).

>  While fairly subtle, this prevents processing any block more than once,

>  and is still sufficient to avoid problems with the "non loop aware"

>  dependence analysis.

Do you propose replacing DT->dominates(BB, Pred) in

  if (!DT->isReachableFromEntry(Pred) || DT->dominates(BB, Pred))

with DT->dominates(OrigBB, Pred), or adding the latter as a third check?

It would be great to have this irreducible CFG test added to the existing tests for DSE.

> One other nit:  The SafeBlocks set is unordered, and only needed for

>  membership queries.  Should it be declared as a SmallSet instead of

>  a SmallSetVector?

That's fine. We actually need a SmallSetVector with reverse insertion order iteration characteristics.

> Finally, I collected some stats when compiling llvm-ar.ll.

> 

> There are 23138 calls to handleNonLocalDependency which

>  reach the worklist processing, with 84 dead instructions found.

>  94% of the time we process 5 or fewer predecessor blocks.

>  In the worst case we looked at 266 predecessors.

> 

> Blocks  Frequency

>  0       11553  50%

>  1        5047  22%

>  2        3858  17%

>  3         778   3%

>  4         285

>  5         221

>  6         148

>  7         112

>  8         193

>  9          57

>  10         53

>  11-50     410

>  51-100     36

>  100-266     7

> 

> In one instance we found 5 dead instructions by searching 22 blocks.

>  I didn't dig into whether those could have been found in fewer blocks.

>  Searching more than 22 blocks was never fruitful for llvm-ar.ll.

> 

> Though we would of course want more samples to make any decisions,

>  limiting the number of worklist blocks processed is another

>  candidate for a compile time threshold, should one be needed.

>  OTOH, it may only matter in pathalogical cases.

Interesting data; it may vary from a source language to a source language. I agree that we need more samples to justify adding such heuristic. For now, I propose:

  static const unsigned MaxNonLocalAttempts = 10;

Ivan

Repository:
  rL LLVM

http://reviews.llvm.org/D13363