[llvm-commits] [PATCH] Allow SelectionDAGBuilder to reorder loads past stores

Mon Dec 19 10:52:49 PST 2011

Hi Hal,

IMHO, this is the wrong way to fix this problem. The fundamental issue here is SelectionDAG is being used by the pre-RA scheduler so the conservative DAG building can have a significant impact on the schedule and the performance of generated code. Changing SelectionDAGBuilder to relax this is one way to go about this. But as you can tell, the current scheduler is not quite ready for this and you can see some large performance regressions.

We have long understood the current scheduling strategy is wrong. The plan for 2012 is to add a MachineInstr pre-RA scheduler and leave SelectionDAG for legalizer and dag combine. The new scheduler will probably use aliasing information to reorder loads / stores.

The problem with changing SelectionDAGBuilder is this can negatively impact optimizations done during dag combine and legalization. So as it is the patch should not be enabled by default. At best, it can be committed but turned off. Do you have a good understanding what caused the severe regressions you have identified?

Evan

On Dec 19, 2011, at 9:42 AM, Hal Finkel wrote:

> The current SelectionDAGBuilder does not allow loads to be reordered
> past stores, and does not allow stores to be reordered. This is a side
> effect of the way the critical chain is constructed: there is a queue of
> pending loads that is flushed (in parallel) to the root of the chain
> upon encountering any store (and that store is also appended to the root
> of the chain). Among other things, loop unrolling is far less effective
> than it otherwise could be.
> 
> The attached patch allows SelectionDAGBuilder to use the available alias
> analysis to reorder independent loads and stores. It changes the queue
> of pending loads into a more general queue of pending memory operations,
> and flushes, in parallel, all potentially-conflicting loads and stores
> as necessary.
> 
> This can result in a significant performance boost. On my x86_64
> machine, the average percentage decrease in execution time is ~8% (to
> calculate my performance numbers from the test suite, I've included only
> the 174 tests with a base execution time of at least 0.1s; the times of
> the shorter tests seem noisy on my machine). Of these, 131 showed a
> performance increase and 36 showed a performance decrease.
> 
> The top-5 winners were:
> MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset - 92%
> performance increase ( = runtime decrease)
> MultiSource/Benchmarks/llubenchmark/llu - 47% performance increase
> MultiSource/Applications/minisat/minisat - 47% performance increase
> MultiSource/Benchmarks/sim/sim - 40% performance increase
> MultiSource/Benchmarks/Prolangs-C++/life/life - 35.7% performance
> increase
> The top-5 losers were:
> MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame - 88%
> performance decrease
> MultiSource/Benchmarks/VersaBench/beamformer/beamformer - 49%
> performance decrease
> MultiSource/Benchmarks/MallocBench/espresso/espresso 47% performance
> decrease
> MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount -
> 21% performance decrease
> MultiSource/Benchmarks/MiBench/network-patricia/network-patricia - 20%
> performance decrease
> 
> The patch adds a few new options:
> max-parallel-chains - replaces the old MaxParallelChains constant)
> max-load-store-reorder - the maximum size of the reorder buffer -
> previously it was unlimited, but contained only stores
> no-reordering-past-stores - invokes the previous behavior
> 
> Some of the regression tests had to be updated because the order of some
> stores changed. For most of these, I just updated the test to reflect
> the new instruction sequence. The following tests I've marked as XFAIL
> because they would require larger changes (and I'd like someone with
> more experience than me to make sure that they really are okay and make
> any necessary adjustments):
> CodeGen/X86/2008-02-22-LocalRegAllocBug.ll
> CodeGen/X86/2010-09-17-SideEffectsInChain.ll
> CodeGen/X86/lea-recursion.ll
> 
> Also, there is one test-suite runtime failure on x86_64:
> MultiSource/Benchmarks/Ptrdist/ft/ft
> 
> And several test-suite runtime failures on i686:
> MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4
> SingleSource/Benchmarks/Misc-C++/Large/ray
> SingleSource/Benchmarks/Misc-C++/stepanov_container
> SingleSource/Benchmarks/Shootout-C++/lists
> SingleSource/Benchmarks/Shootout-C++/lists1
> SingleSource/Benchmarks/Shootout-C++/sieve
> 
> Please review (and help with the test-suite failures).
> 
> Thank you in advance,
> Hal
> 
> -- 
> Hal Finkel
> Postdoctoral Appointee
> Leadership Computing Facility
> Argonne National Laboratory
> 
> <llvm_lsro-20111219.diff>_______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits