[PATCH] D13363: [DeadStoreElimination] Add support for non-local DSE

Mitch Bodart via llvm-commits llvm-commits at lists.llvm.org
Fri Jan 22 14:09:11 PST 2016


mbodart added a comment.

I have a resolution for the issue causing bot timeouts,
as well as one unrelated fix for irreducible control flow.
Fortunately both fixes are trivial.  Details below.

Chad or Ivan, how do you want to proceed?
Can you apply these fixes and re-land the change?

I used Rafael's llvm-ar.ll reproducer at
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151207/318782.html
to chase the bot timeouts.

The issue is in findSafePreds, which was getting tripped up by a switch
statement, and inadvertently adding the same block to the worklist
multiple times.  The predecessor iterator needs to avoid processing
a given predecessor more than once.  Keeping track with a "PredsVisited"
set resolves that, and there is no longer any noticeable compile time
difference w/wo nonlocal DSE for llvm-ar.ll.

Wrt irreducibility, I was able to construct a test case for which the
CFG search goes into a cycle and revisits the original safe block
that instigated the nonlocal DSE search, SI->getParent() in
DSE::handleNonLocalDependency().  My earlier suggestion for the search
routine had an explicit check for !SafeBlocks.count(P) to avoid this
situation, but that was elided in favor of the dominator check
DT->dominates(BB, Pred).  With irreducible control flow we can have
a case where Pred is the original safe block, but BB does not dominate it.

One fix here is to pass in that original block to findSafePreds,
and replace the dominator check with DT->dominates(OrigBB, Pred).
While fairly subtle, this prevents processing any block more than once,
and is still sufficient to avoid problems with the "non loop aware"
dependence analysis.

One other nit:  The SafeBlocks set is unordered, and only needed for
membership queries.  Should it be declared as a SmallSet instead of
a SmallSetVector?

Finally, I collected some stats when compiling llvm-ar.ll.

There are 23138 calls to handleNonLocalDependency which
reach the worklist processing, with 84 dead instructions found.
94% of the time we process 5 or fewer predecessor blocks.
In the worst case we looked at 266 predecessors.

Blocks  Frequency
0       11553  50%
1        5047  22%
2        3858  17%
3         778   3%
4         285
5         221
6         148
7         112
8         193
9          57
10         53
11-50     410
51-100     36
100-266     7

In one instance we found 5 dead instructions by searching 22 blocks.
I didn't dig into whether those could have been found in fewer blocks.
Searching more than 22 blocks was never fruitful for llvm-ar.ll.

Though we would of course want more samples to make any decisions,
limiting the number of worklist blocks processed is another
candidate for a compile time threshold, should one be needed.
OTOH, it may only matter in pathalogical cases.

- mitch


Repository:
  rL LLVM

http://reviews.llvm.org/D13363





More information about the llvm-commits mailing list