[llvm-commits] [PATCH] Allow SelectionDAGBuilder to reorder loads past stores

Mon Dec 19 11:15:18 PST 2011

On Mon, 2011-12-19 at 10:08 -0800, Owen Anderson wrote:
> Hal,
> 
> How does this compare to the -combiner-alias-analysis and -combiner-global-alias-analysis already in SelectionDAG?  I gave the former a try about a year ago, and found that it at least didn't cause things to fail.
> 

Doing this in SelectionDAGBuilder is a better approach because it uses
the aliasing analysis when the IR instructions are still available. This
is important because the aliasing analysis is more powerful with the
original instructions compared to using reconstructed locations (which
is what the DAG combiner needs to do). For one thing, by the time the
combiner sees the loads and stores they may have offsets, and there is
no good way to use ptr+offset pairs with the aliasing analysis. My patch
also can deal correctly with intrinsics and calls, etc. because it uses
the original instructions in combination with the mod/ref interface from
the aliasing analysis.

As a practical manner, my patch does a much better job than the
-combiner-alias-analysis/-combiner-global-alias-analysis pair. I tried
these at first, but because it cannot deal correctly with the ptr+offset
pairs, etc., the conservative approximations that it needs to make
severely constrain what it can do. With a set of benchmarks I've
constructed with a bunch of unrolled loops, the DAG combiner with those
flags was essentially not able to do anything. My patch, however,
handles these quite well. This seems to be true of a number of
applications in the test suite as well.

I think that putting in the effort to make this work correctly will be
worthwhile (I suspect that the test-suite failures are backend bugs, but
I can't be certain).

 -Hal

> --Owen
> 
> 
> On Dec 19, 2011, at 9:42 AM, Hal Finkel wrote:
> 
> > The current SelectionDAGBuilder does not allow loads to be reordered
> > past stores, and does not allow stores to be reordered. This is a side
> > effect of the way the critical chain is constructed: there is a queue of
> > pending loads that is flushed (in parallel) to the root of the chain
> > upon encountering any store (and that store is also appended to the root
> > of the chain). Among other things, loop unrolling is far less effective
> > than it otherwise could be.
> > 
> > The attached patch allows SelectionDAGBuilder to use the available alias
> > analysis to reorder independent loads and stores. It changes the queue
> > of pending loads into a more general queue of pending memory operations,
> > and flushes, in parallel, all potentially-conflicting loads and stores
> > as necessary.
> > 
> > This can result in a significant performance boost. On my x86_64
> > machine, the average percentage decrease in execution time is ~8% (to
> > calculate my performance numbers from the test suite, I've included only
> > the 174 tests with a base execution time of at least 0.1s; the times of
> > the shorter tests seem noisy on my machine). Of these, 131 showed a
> > performance increase and 36 showed a performance decrease.
> > 
> > The top-5 winners were:
> > MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset - 92%
> > performance increase ( = runtime decrease)
> > MultiSource/Benchmarks/llubenchmark/llu - 47% performance increase
> > MultiSource/Applications/minisat/minisat - 47% performance increase
> > MultiSource/Benchmarks/sim/sim - 40% performance increase
> > MultiSource/Benchmarks/Prolangs-C++/life/life - 35.7% performance
> > increase
> > The top-5 losers were:
> > MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame - 88%
> > performance decrease
> > MultiSource/Benchmarks/VersaBench/beamformer/beamformer - 49%
> > performance decrease
> > MultiSource/Benchmarks/MallocBench/espresso/espresso 47% performance
> > decrease
> > MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount -
> > 21% performance decrease
> > MultiSource/Benchmarks/MiBench/network-patricia/network-patricia - 20%
> > performance decrease
> > 
> > The patch adds a few new options:
> > max-parallel-chains - replaces the old MaxParallelChains constant)
> > max-load-store-reorder - the maximum size of the reorder buffer -
> > previously it was unlimited, but contained only stores
> > no-reordering-past-stores - invokes the previous behavior
> > 
> > Some of the regression tests had to be updated because the order of some
> > stores changed. For most of these, I just updated the test to reflect
> > the new instruction sequence. The following tests I've marked as XFAIL
> > because they would require larger changes (and I'd like someone with
> > more experience than me to make sure that they really are okay and make
> > any necessary adjustments):
> > CodeGen/X86/2008-02-22-LocalRegAllocBug.ll
> > CodeGen/X86/2010-09-17-SideEffectsInChain.ll
> > CodeGen/X86/lea-recursion.ll
> > 
> > Also, there is one test-suite runtime failure on x86_64:
> > MultiSource/Benchmarks/Ptrdist/ft/ft
> > 
> > And several test-suite runtime failures on i686:
> > MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4
> > SingleSource/Benchmarks/Misc-C++/Large/ray
> > SingleSource/Benchmarks/Misc-C++/stepanov_container
> > SingleSource/Benchmarks/Shootout-C++/lists
> > SingleSource/Benchmarks/Shootout-C++/lists1
> > SingleSource/Benchmarks/Shootout-C++/sieve
> > 
> > Please review (and help with the test-suite failures).
> > 
> > Thank you in advance,
> > Hal
> > 
> > -- 
> > Hal Finkel
> > Postdoctoral Appointee
> > Leadership Computing Facility
> > Argonne National Laboratory
> > 
> > <llvm_lsro-20111219.diff>_______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory