[polly] r240689 - Enable ISL's small integer optimization

Thu Jun 25 23:55:54 PDT 2015

On 06/25/2015 10:47 PM, Michael Kruse wrote:
> Author: meinersbur
> Date: Thu Jun 25 15:47:35 2015
> New Revision: 240689
>
> URL: http://llvm.org/viewvc/llvm-project?rev=240689&view=rev
> Log:
> Enable ISL's small integer optimization
>
> Summary:
> With small integer optimization (short: sio) enabled, ISL uses 32 bit
> integers for its arithmetic and only falls back to a big integer library
> (in the case of Polly: IMath) if an operation's result is too large.
> This gives a massive performance boost for most application using ISL.
> For instance, experiments with ppcg (polyhedral source-to-source
> compiler) show speed-ups of 5.8 (compared to plain IMath), respectively
> 2.7 (compared to GMP).
>
> In Polly, a smaller fraction of the total compile time is taken by ISL,
> but the speed-ups are still very significant. The buildbots measure
> compilation speed-up up to 1.8 (oourafft, floyd-warshall, symm). All
> Polybench benchmarks compile in at least 9% less time, and about 20%
> less on average.
>
> Detailed Polybench compile time results (median of 10):
> correlation     -25.51%
> covariance      -24.82%
> 2mm             -26.64%
> 3mm             -28.69%
> atax            -13.70%
> bicg            -10.78%
> cholesky        -40.67%
> doitgen         -11.60%
> gemm            -11.54%
> gemver          -10.63%
> gesummv         -11.54%
> mvt              -9.43%
> symm            -41.25%
> syr2k           -14.71%
> syrk            -14.52%
> trisolv         -17.65%
> trmm             -9.78%
> durbin          -19.32%
> dynprog          -9.09%
> gramschmidt     -15.38%
> lu              -21.77%
> floyd-warshall  -42.71%
> reg_detect      -41.17%
> adi             -36.69%
> fdtd-2d         -32.61%
> fdtd-apml       -21.90%
> jacobi-1d-imper  -9.41%
> jacobi-2d-imper -27.65%
> seidel-2d       -31.00%

Very nice. Here the corresponding performance bot results:

"clang -O3 -mllvm -polly" BEFORE
vs.
"clang -O3 -mllvm -polly" AFTER

http://llvm.org/perf/db_default/v4/nts/27877?compare_to=27869

"clang -O3"
vs.
"clang -O3 -mllvm -polly" BEFORE

http://llvm.org/perf/db_default/v4/nts/27869?compare_to=27876

"clang -O3"
vs.
"clang -O3 -mllvm -polly" AFTER

http://llvm.org/perf/db_default/v4/nts/27877?compare_to=27876

Instead of a 450% slowdown compared to LLVM in terms of compile time, 
the largest slowdown is now 200% (0.13 s to 0.44s) and most kernels show 
even less slowdown. A larger part of the remaining slowdown (about half 
of it) is increased LLVM codegen time due to code versioning.
200% slowdown may sound a lot, but these are indeed the loop kernels we 
optimize, meaning their compile-time overall is commonly less than a 
second. Compile time impact on entire applications is commonly a lot less.

There is still some headroom for further optimizations, which we are 
aiming for. However, for now, thanks again to Michael Kruse for this 
great patch and to Pratik Bhatu for the preparations and preliminary 
studies that allowed us to understand the performance benefits such an 
optimization give.

Best,
Tobias