[PATCH] D13363: [DeadStoreElimination] Add support for non-local DSE
Mitch Bodart via llvm-commits
llvm-commits at lists.llvm.org
Fri Jan 22 14:09:11 PST 2016
mbodart added a comment.
I have a resolution for the issue causing bot timeouts,
as well as one unrelated fix for irreducible control flow.
Fortunately both fixes are trivial. Details below.
Chad or Ivan, how do you want to proceed?
Can you apply these fixes and re-land the change?
I used Rafael's llvm-ar.ll reproducer at
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151207/318782.html
to chase the bot timeouts.
The issue is in findSafePreds, which was getting tripped up by a switch
statement, and inadvertently adding the same block to the worklist
multiple times. The predecessor iterator needs to avoid processing
a given predecessor more than once. Keeping track with a "PredsVisited"
set resolves that, and there is no longer any noticeable compile time
difference w/wo nonlocal DSE for llvm-ar.ll.
Wrt irreducibility, I was able to construct a test case for which the
CFG search goes into a cycle and revisits the original safe block
that instigated the nonlocal DSE search, SI->getParent() in
DSE::handleNonLocalDependency(). My earlier suggestion for the search
routine had an explicit check for !SafeBlocks.count(P) to avoid this
situation, but that was elided in favor of the dominator check
DT->dominates(BB, Pred). With irreducible control flow we can have
a case where Pred is the original safe block, but BB does not dominate it.
One fix here is to pass in that original block to findSafePreds,
and replace the dominator check with DT->dominates(OrigBB, Pred).
While fairly subtle, this prevents processing any block more than once,
and is still sufficient to avoid problems with the "non loop aware"
dependence analysis.
One other nit: The SafeBlocks set is unordered, and only needed for
membership queries. Should it be declared as a SmallSet instead of
a SmallSetVector?
Finally, I collected some stats when compiling llvm-ar.ll.
There are 23138 calls to handleNonLocalDependency which
reach the worklist processing, with 84 dead instructions found.
94% of the time we process 5 or fewer predecessor blocks.
In the worst case we looked at 266 predecessors.
Blocks Frequency
0 11553 50%
1 5047 22%
2 3858 17%
3 778 3%
4 285
5 221
6 148
7 112
8 193
9 57
10 53
11-50 410
51-100 36
100-266 7
In one instance we found 5 dead instructions by searching 22 blocks.
I didn't dig into whether those could have been found in fewer blocks.
Searching more than 22 blocks was never fruitful for llvm-ar.ll.
Though we would of course want more samples to make any decisions,
limiting the number of worklist blocks processed is another
candidate for a compile time threshold, should one be needed.
OTOH, it may only matter in pathalogical cases.
- mitch
Repository:
rL LLVM
http://reviews.llvm.org/D13363
More information about the llvm-commits
mailing list