[PATCH] D15537: limit the number of instructions per block examined by dead store elimination

Tue Jan 5 15:40:03 PST 2016

Hi Bob,

Thanks for taking a look at this. Is there a testcase you can share?

As you noticed,
MemoryDependenceAnalysis::getSimplePointerDependencyFrom already uses
two mechanisms to circumvent compile time issues: (1) each scan/run is
limited by BlockScanLimit and (2) OrderedBasicBlock is used to speed
up capture analysis.

By your description it seems that the the bottleneck arrises from the
successive calls to getSimplePointerDependencyFrom, where even each
being limited by 100, it does not scale. However, by "continuing
counting where we left off" we only analyse 100 instructions total.
What's the total time your testcase takes to compile before and after
your patch? How long it would take if you pick different `Limit`
values?

Additionally, did you try instantiating an OrderedBasicBlock before
"while (InstDep.isDef() || InstDep.isClobber())" and passing the
object down through getPointerDependencyFrom instead of creating a new
one every time getSimplePointerDependencyFrom is called? I believe
that could save a lot of extra basic block walk during capture
analysis (which is very expensive for large basic blocks) and could
avoid placing more thresholds.

On Thu, Dec 17, 2015 at 3:29 PM, Bob Haarman via llvm-commits
<llvm-commits at lists.llvm.org> wrote:
> On 2015-12-17 14:11 , Philip Reames wrote:
>>
>> Have you looked at other ways to address the compile time impact? I'm
>> uncomfortable with relying on a simple cutoff unless that's the absolutely
>> last resort.
>
> Thanks for taking a look, Philip. I chose the approach because I think it is
> a reasonable compromise between run time effort, quality of the generated
> code, and implementation effort. A couple of points about this approach (for
> simplicity, I've mentioned "100" in the text below, but the value is
> configurable):
>
> 1. The limit is not new - it already exists in
> getSimplePointerDependencyFrom. That function searches backwards over the
> instructions in the basic block, looking for a load or store that does or
> may affect the same memory location. If we don't find one before examining
> 100 instructions, we give up, with or without this change. So, worst case,
> we fail to eliminate a single dead store in over 100 instructions. And that
> worst case exists with or without this patch. Under a very specific set of
> conditions (explained below), we will hit the worst case with the patch
> where we wouldn't without the patch, but I've never seen that happen.
>
> [[Conditions where the patch misses a dead store that the existing code
> wouldn't miss:
>
> The DSE pass uses getSimplePointerDependencyFrom to find earlier
> instructions that do or may load or store from/to the same address as a
> store we are analyzing. If we do find such an instruction, we then look at
> it in the DSE pass to determine if it is actually relevant to the store we
> are currently analyzing. If it is, we take an appropriate action (for
> example, if we find a store to the same location as a later store, with no
> intervening loads, we eliminate the earlier store). If the instruction we
> found turns out not to be relevant after all (store to an unknown location -
> this is returned by getSimplePointerDependencyFrom, but we can neither
> determine for sure that we can eliminate it, nor does it invalidate further
> analysis), we go back into getSimplePointerDependencyFrom and look for the
> next potentially relevant instruction. What is different between the
> existing behavior and what this patch implements is that, without the patch,
> we then reset the counter to 100 instructions, whereas with the patch, we
> continue counting where we left off. So, to cause a difference in emitted
> code, we need (a) a store that can be eliminated, more than 100 instructions
> away from the store that kills it, (b) no loads that may alias with the dead
> store in between the two stores (because that would invalidate the
> analysis), and (c) stores that may alias with the dead store every
> less-than-100 instructions between the two stores (to keep the analysis from
> hitting the 100-instruction limit).]]
>
> 2. Without the patch, the potential cost of the analysis grows as the square
> of the basic block length. With the patch, the potential cost is linear in
> basic block length. This puts it in a better complexity class than other
> solutions I considered.
>
> 3. With the default of 100 for the limit, the worst that can happen is we
> fail to eliminate a dead store more than 100 instructions away from the
> store that kills it. If desired, we can reduce the impact on generated code
> by setting a higher limit. There is already a parameter that controls this.
> We could even allow removing the limit altogether if we think that is
> useful.
>
> 4. I have not found any cases where this patch causes dead stores to be
> retained that would otherwise be eliminated, but if it ever does become a
> problem, we can always implement a more complex solution at that point.
>
> I did consider some alternatives to limiting the number of instructions we
> examine. However, all of them seem more effort to implement for less gain in
> compiler throughput. I figured I would put up the patch I have and see what
> people think. It's relatively simple, lowers the complexity class of the DSE
> pass, and substantially reduces build time for a couple of files that were
> taking a very long time to compile before. It does slightly widen the window
> for missing dead stores, but the worst case is one that we have already
> accepted in the existing code, that doesn't seem all that terrible, and I
> haven't seen code that missed optimizations with the patch that it didn't
> without the patch. I think it's a reasonable compromise, but, of course,
> I'll defer to your call on whether or not to accept the patch.
>
> Bob
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits

-- 
Bruno Cardoso Lopes
http://www.brunocardoso.cc