[PATCH] D28152: Cortex-A57 scheduling model for ARM backend (AArch32)

Tue Jan 31 00:48:58 PST 2017

kristof.beyls added a comment.

In https://reviews.llvm.org/D28152#660490, @andrew.zhogin wrote:

> In https://reviews.llvm.org/D28152#660355, @kristof.beyls wrote:
>
> > I'm afraid I still see the same failures for cmake-driven benchmark suites.
> >  For non-cmake-driven benchmark suites, I see the following error message at compile time:
> >
> >   DefIdx 0 exceeds machine model writes for %R0<def> = tLDRi %R0<kill>, 0, pred:14, pred:%noreg; mem:LD4[@Reg](tbaa=!12)(dereferenceable)
> >    (Try with MCSchedModel.CompleteModel set to false)incomplete machine model
> >   UNREACHABLE executed at /work/llvm-test/slave2/cross-build/build/llvm/lib/CodeGen/TargetSchedule.cpp:216!
> >
>
>
> I don't understand - tLDRi must be covered by "tLDR" regexp here:
>
>   def : InstRW<[A57Write_4cyc_1L], (instregex "LDRi12", "LDRBi12",
>     "LDRcp", "(t2|t)?LDRConstPool", "LDRLIT_ga_(pcrel|abs)",
>     "PICLDR", "tLDR")>;
>
>
> Are you sure using the updated patch and clang?
>  And yes, I used lnt really:
>
>   ./lnt runtest test-suite --sandbox ~/fast/sandbox_arm --no-timestamp --test-suite ~/fast/test-suite --benchmarking-only --cppflags '-O3 -DNDEBUG -mcpu=cortex-a57 -mthumb -fomit-frame-pointer ' --threads 1 --build-threads 6 --use-perf time --use-lit lit --exec-multisample 1 --only-test=SingleSource/Benchmarks --cmake-define 'CMAKE_C_FLAGS_RELEASE=""' --cmake-define 'CMAKE_CXX_FLAGS_RELEASE=""' --cc ~/fast/llvm_trunk.build/bin/clang
>

I double checked, and indeed it seems that somehow I received the old version of the patch when re-downloading it from phabricator.
Anyway, I've downloaded the latest version of the patch again, checking I indeed received the latest version.
With that version, benchmarking passes.
As expected, there are lots of performance swings either way, but on geomean, across a large number of programs (both the ones in the test-suite and from proprietary benchmarks), I see 0.65% improvement for the benchmarks reporting execution time and a 0.35% improvement for the benchmarks reporting scores.
So, in summary, the patch results in an improvement on average.

FWIW, here are the programs with the biggest swings in the test-suite:

Regressions:
MultiSource/Benchmarks/FreeBench/analyzer/analyzer	17.26%
SingleSource/Benchmarks/McGill/queens	10.57%
MultiSource/Benchmarks/Ptrdist/ft/ft	10.49%
MultiSource/Applications/siod/siod	5.32%
SingleSource/Benchmarks/BenchmarkGame/fannkuch	5.26%
SingleSource/Benchmarks/Misc/mandel-2	2.68%
MultiSource/Benchmarks/VersaBench/8b10b/8b10b	2.30%
MultiSource/Benchmarks/Fhourstones-3.1/fhourstones3.1	1.59%
MultiSource/Benchmarks/TSVC/NodeSplitting-flt/NodeSplitting-flt	1.43%
MultiSource/Benchmarks/NPB-serial/is/is	1.34%
SingleSource/Benchmarks/Misc/perlin	1.19%
SingleSource/Benchmarks/Misc/matmul_f64_4x4	1.05%
MultiSource/Benchmarks/Ptrdist/bc/bc	1.00%

Improvements:
SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding	-26.47%
MultiSource/Applications/sgefa/sgefa	-22.98%
MultiSource/Benchmarks/TSVC/Searching-flt/Searching-flt	-16.13%
MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl	-15.85%
SingleSource/Benchmarks/Shootout/random	-13.33%
SingleSource/Benchmarks/Shootout-C++/Shootout-C++-random	-13.33%
MultiSource/Benchmarks/TSVC/LoopRerolling-flt/LoopRerolling-flt	-8.33%
MultiSource/Benchmarks/TSVC/LoopRerolling-dbl/LoopRerolling-dbl	-8.02%
MultiSource/Benchmarks/TSVC/LinearDependence-dbl/LinearDependence-dbl	-7.28%
MultiSource/Benchmarks/TSVC/ControlLoops-dbl/ControlLoops-dbl	-7.06%
MultiSource/Benchmarks/Trimaran/enc-rc4/enc-rc4	-6.82%
MultiSource/Benchmarks/TSVC/ControlLoops-flt/ControlLoops-flt	-6.67%
MultiSource/Applications/aha/aha	-6.49%
MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt	-6.38%
MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl	-6.34%
SingleSource/Benchmarks/Shootout/ary3	-6.32%
SingleSource/Benchmarks/Shootout-C++/Shootout-C++-ary3	-6.10%
MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt	-5.71%
MultiSource/Benchmarks/TSVC/Expansion-flt/Expansion-flt	-4.76%
MultiSource/Benchmarks/Olden/mst/mst	-4.66%
MultiSource/Benchmarks/TSVC/Expansion-dbl/Expansion-dbl	-4.46%
SingleSource/Benchmarks/Misc/pi	-4.01%
MultiSource/Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt	-3.76%
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/cholesky/cholesky	-3.60%
SingleSource/Benchmarks/CoyoteBench/huffbench	-3.57%
SingleSource/Benchmarks/Shootout-C++/Shootout-C++-hash	-3.41%
MultiSource/Benchmarks/Bullet/bullet	-3.08%
SingleSource/Benchmarks/Misc/ReedSolomon	-2.87%
SingleSource/Benchmarks/Misc/flops-2	-2.78%
SingleSource/Benchmarks/Misc-C++/mandel-text	-2.46%
MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl	-2.30%
MultiSource/Benchmarks/TSVC/ControlFlow-flt/ControlFlow-flt	-2.01%
MultiSource/Benchmarks/Ptrdist/yacr2/yacr2	-1.78%
MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk	-1.76%
SingleSource/Benchmarks/Stanford/FloatMM	-1.66%
SingleSource/Benchmarks/Misc-C++-EH/spirit	-1.46%
MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4	-1.22%
MultiSource/Benchmarks/mafft/pairlocalalign	-1.16%
MultiSource/Benchmarks/VersaBench/dbms/dbms	-1.11%

https://reviews.llvm.org/D28152