[polly] r240689 - Enable ISL's small integer optimization

Fri Jun 26 01:50:56 PDT 2015

On 06/26/2015 08:55 AM, Tobias Grosser wrote:
> On 06/25/2015 10:47 PM, Michael Kruse wrote:
>> Author: meinersbur
>> Date: Thu Jun 25 15:47:35 2015
>> New Revision: 240689
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=240689&view=rev
>> Log:
>> Enable ISL's small integer optimization
>>
>> Summary:
>> With small integer optimization (short: sio) enabled, ISL uses 32 bit
>> integers for its arithmetic and only falls back to a big integer library
>> (in the case of Polly: IMath) if an operation's result is too large.
>> This gives a massive performance boost for most application using ISL.
>> For instance, experiments with ppcg (polyhedral source-to-source
>> compiler) show speed-ups of 5.8 (compared to plain IMath), respectively
>> 2.7 (compared to GMP).
>>
>> In Polly, a smaller fraction of the total compile time is taken by ISL,
>> but the speed-ups are still very significant. The buildbots measure
>> compilation speed-up up to 1.8 (oourafft, floyd-warshall, symm). All
>> Polybench benchmarks compile in at least 9% less time, and about 20%
>> less on average.
>>
>> Detailed Polybench compile time results (median of 10):
>> correlation     -25.51%
>> covariance      -24.82%
>> 2mm             -26.64%
>> 3mm             -28.69%
>> atax            -13.70%
>> bicg            -10.78%
>> cholesky        -40.67%
>> doitgen         -11.60%
>> gemm            -11.54%
>> gemver          -10.63%
>> gesummv         -11.54%
>> mvt              -9.43%
>> symm            -41.25%
>> syr2k           -14.71%
>> syrk            -14.52%
>> trisolv         -17.65%
>> trmm             -9.78%
>> durbin          -19.32%
>> dynprog          -9.09%
>> gramschmidt     -15.38%
>> lu              -21.77%
>> floyd-warshall  -42.71%
>> reg_detect      -41.17%
>> adi             -36.69%
>> fdtd-2d         -32.61%
>> fdtd-apml       -21.90%
>> jacobi-1d-imper  -9.41%
>> jacobi-2d-imper -27.65%
>> seidel-2d       -31.00%
>
>
> Very nice. Here the corresponding performance bot results:
>
> "clang -O3 -mllvm -polly" BEFORE
> vs.
> "clang -O3 -mllvm -polly" AFTER
>
> http://llvm.org/perf/db_default/v4/nts/27877?compare_to=27869
>
> "clang -O3"
> vs.
> "clang -O3 -mllvm -polly" BEFORE
>
> http://llvm.org/perf/db_default/v4/nts/27869?compare_to=27876
>
> "clang -O3"
> vs.
> "clang -O3 -mllvm -polly" AFTER
>
> http://llvm.org/perf/db_default/v4/nts/27877?compare_to=27876
>
>
> Instead of a 450% slowdown compared to LLVM in terms of compile time,
> the largest slowdown is now 200% (0.13 s to 0.44s) and most kernels show
> even less slowdown. A larger part of the remaining slowdown (about half
> of it) is increased LLVM codegen time due to code versioning.
> 200% slowdown may sound a lot, but these are indeed the loop kernels we
> optimize, meaning their compile-time overall is commonly less than a
> second. Compile time impact on entire applications is commonly a lot less.
>
> There is still some headroom for further optimizations, which we are
> aiming for. However, for now, thanks again to Michael Kruse for this
> great patch and to Pratik Bhatu for the preparations and preliminary
> studies that allowed us to understand the performance benefits such an
> optimization give.

For the record, besides Pratik's nice prepartions, Michael had already 
worked on a similar optimization for the gmp backend in the context of 
his Molly project and as a result also obtained valuable insights (and
experience) that have proven very useful when doing this work.

Best,
Tobias