[PATCH] D15537: limit the number of instructions per block examined by dead store elimination

Thu Dec 17 15:29:36 PST 2015

On 2015-12-17 14:11 , Philip Reames wrote:
> Have you looked at other ways to address the compile time impact? I'm 
> uncomfortable with relying on a simple cutoff unless that's the 
> absolutely last resort.
Thanks for taking a look, Philip. I chose the approach because I think 
it is a reasonable compromise between run time effort, quality of the 
generated code, and implementation effort. A couple of points about this 
approach (for simplicity, I've mentioned "100" in the text below, but 
the value is configurable):

1. The limit is not new - it already exists in 
getSimplePointerDependencyFrom. That function searches backwards over 
the instructions in the basic block, looking for a load or store that 
does or may affect the same memory location. If we don't find one before 
examining 100 instructions, we give up, with or without this change. So, 
worst case, we fail to eliminate a single dead store in over 100 
instructions. And that worst case exists with or without this patch. 
Under a very specific set of conditions (explained below), we will hit 
the worst case with the patch where we wouldn't without the patch, but 
I've never seen that happen.

[[Conditions where the patch misses a dead store that the existing code 
wouldn't miss:

The DSE pass uses getSimplePointerDependencyFrom to find earlier 
instructions that do or may load or store from/to the same address as a 
store we are analyzing. If we do find such an instruction, we then look 
at it in the DSE pass to determine if it is actually relevant to the 
store we are currently analyzing. If it is, we take an appropriate 
action (for example, if we find a store to the same location as a later 
store, with no intervening loads, we eliminate the earlier store). If 
the instruction we found turns out not to be relevant after all (store 
to an unknown location - this is returned by 
getSimplePointerDependencyFrom, but we can neither determine for sure 
that we can eliminate it, nor does it invalidate further analysis), we 
go back into getSimplePointerDependencyFrom and look for the next 
potentially relevant instruction. What is different between the existing 
behavior and what this patch implements is that, without the patch, we 
then reset the counter to 100 instructions, whereas with the patch, we 
continue counting where we left off. So, to cause a difference in 
emitted code, we need (a) a store that can be eliminated, more than 100 
instructions away from the store that kills it, (b) no loads that may 
alias with the dead store in between the two stores (because that would 
invalidate the analysis), and (c) stores that may alias with the dead 
store every less-than-100 instructions between the two stores (to keep 
the analysis from hitting the 100-instruction limit).]]

2. Without the patch, the potential cost of the analysis grows as the 
square of the basic block length. With the patch, the potential cost is 
linear in basic block length. This puts it in a better complexity class 
than other solutions I considered.

3. With the default of 100 for the limit, the worst that can happen is 
we fail to eliminate a dead store more than 100 instructions away from 
the store that kills it. If desired, we can reduce the impact on 
generated code by setting a higher limit. There is already a parameter 
that controls this. We could even allow removing the limit altogether if 
we think that is useful.

4. I have not found any cases where this patch causes dead stores to be 
retained that would otherwise be eliminated, but if it ever does become 
a problem, we can always implement a more complex solution at that point.

I did consider some alternatives to limiting the number of instructions 
we examine. However, all of them seem more effort to implement for less 
gain in compiler throughput. I figured I would put up the patch I have 
and see what people think. It's relatively simple, lowers the complexity 
class of the DSE pass, and substantially reduces build time for a couple 
of files that were taking a very long time to compile before. It does 
slightly widen the window for missing dead stores, but the worst case is 
one that we have already accepted in the existing code, that doesn't 
seem all that terrible, and I haven't seen code that missed 
optimizations with the patch that it didn't without the patch. I think 
it's a reasonable compromise, but, of course, I'll defer to your call on 
whether or not to accept the patch.

Bob