<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jun 26, 2015 at 2:20 PM, Tobias Grosser <span dir="ltr"><<a href="mailto:tobias@grosser.es" target="_blank">tobias@grosser.es</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On 06/26/2015 08:55 AM, Tobias Grosser wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

On 06/25/2015 10:47 PM, Michael Kruse wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Author: meinersbur<br>

Date: Thu Jun 25 15:47:35 2015<br>

New Revision: 240689<br>

<br>

URL: <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_viewvc_llvm-2Dproject-3Frev-3D240689-26view-3Drev&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=CO8z2UvKDEAiXAUANT5RTyg36LOAUQTVsRSbzyXMB5g&s=s4A5fIFC8VT2LOGUxyTbuzuDcCEWfR0PS8CyMHuu8Ys&e=" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project?rev=240689&view=rev</a><br>

Log:<br>

Enable ISL's small integer optimization<br>

<br>

Summary:<br>

With small integer optimization (short: sio) enabled, ISL uses 32 bit<br>

integers for its arithmetic and only falls back to a big integer library<br>

(in the case of Polly: IMath) if an operation's result is too large.<br>

This gives a massive performance boost for most application using ISL.<br>

For instance, experiments with ppcg (polyhedral source-to-source<br>

compiler) show speed-ups of 5.8 (compared to plain IMath), respectively<br>

2.7 (compared to GMP).<br>

<br>

In Polly, a smaller fraction of the total compile time is taken by ISL,<br>

but the speed-ups are still very significant. The buildbots measure<br>

compilation speed-up up to 1.8 (oourafft, floyd-warshall, symm). All<br>

Polybench benchmarks compile in at least 9% less time, and about 20%<br>

less on average.<br>

<br>

Detailed Polybench compile time results (median of 10):<br>

correlation     -25.51%<br>

covariance      -24.82%<br>

2mm             -26.64%<br>

3mm             -28.69%<br>

atax            -13.70%<br>

bicg            -10.78%<br>

cholesky        -40.67%<br>

doitgen         -11.60%<br>

gemm            -11.54%<br>

gemver          -10.63%<br>

gesummv         -11.54%<br>

mvt              -9.43%<br>

symm            -41.25%<br>

syr2k           -14.71%<br>

syrk            -14.52%<br>

trisolv         -17.65%<br>

trmm             -9.78%<br>

durbin          -19.32%<br>

dynprog          -9.09%<br>

gramschmidt     -15.38%<br>

lu              -21.77%<br>

floyd-warshall  -42.71%<br>

reg_detect      -41.17%<br>

adi             -36.69%<br>

fdtd-2d         -32.61%<br>

fdtd-apml       -21.90%<br>

jacobi-1d-imper  -9.41%<br>

jacobi-2d-imper -27.65%<br>

seidel-2d       -31.00%<br>

</blockquote>

<br>

<br>

Very nice. Here the corresponding performance bot results:<br>

<br>

"clang -O3 -mllvm -polly" BEFORE<br>

vs.<br>

"clang -O3 -mllvm -polly" AFTER<br>

<br>

<a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_perf_db-5Fdefault_v4_nts_27877-3Fcompare-5Fto-3D27869&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=CO8z2UvKDEAiXAUANT5RTyg36LOAUQTVsRSbzyXMB5g&s=y1FU-6AINFxeNjlU1AYnjPxsx-Az7SpnAQo7S0SnLCc&e=" rel="noreferrer" target="_blank">http://llvm.org/perf/db_default/v4/nts/27877?compare_to=27869</a><br>

<br>

"clang -O3"<br>

vs.<br>

"clang -O3 -mllvm -polly" BEFORE<br>

<br>

<a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_perf_db-5Fdefault_v4_nts_27869-3Fcompare-5Fto-3D27876&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=CO8z2UvKDEAiXAUANT5RTyg36LOAUQTVsRSbzyXMB5g&s=rn_a9F1dkQIXNjHgkg0roAhIW_6vmcv1i7MJXJqyUgs&e=" rel="noreferrer" target="_blank">http://llvm.org/perf/db_default/v4/nts/27869?compare_to=27876</a><br>

<br>

"clang -O3"<br>

vs.<br>

"clang -O3 -mllvm -polly" AFTER<br>

<br>

<a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_perf_db-5Fdefault_v4_nts_27877-3Fcompare-5Fto-3D27876&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=mQ4LZ2PUj9hpadE3cDHZnIdEwhEBrbAstXeMaFoB9tg&m=CO8z2UvKDEAiXAUANT5RTyg36LOAUQTVsRSbzyXMB5g&s=0jO3XZ15Zt6nVCpvSizjWuS5MbR9KKWvyoEdnn38mTI&e=" rel="noreferrer" target="_blank">http://llvm.org/perf/db_default/v4/nts/27877?compare_to=27876</a><br>

<br>

<br></blockquote></div></div></blockquote><div><br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Instead of a 450% slowdown compared to LLVM in terms of compile time,<br>

the largest slowdown is now 200% (0.13 s to 0.44s) and most kernels show<br>

even less slowdown. A larger part of the remaining slowdown (about half<br>

of it) is increased LLVM codegen time due to code versioning.<br>

200% slowdown may sound a lot, but these are indeed the loop kernels we<br>

optimize, meaning their compile-time overall is commonly less than a<br>

second. Compile time impact on entire applications is commonly a lot less.<br></blockquote></div></div></blockquote><div><br></div><div>Great indeed. I guess we can target for a max of 100% slowdown next?</div><div> </div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

There is still some headroom for further optimizations, which we are<br>

aiming for. However, for now, thanks again to Michael Kruse for this<br>

great patch and to Pratik Bhatu for the preparations and preliminary<br>

studies that allowed us to understand the performance benefits such an<br>

optimization give.<br>

</blockquote>

<br></div></div>

For the record, besides Pratik's nice prepartions, Michael had already worked on a similar optimization for the gmp backend in the context of his Molly project and as a result also obtained valuable insights (and<br>

experience) that have proven very useful when doing this work.</blockquote><div><br></div><div>I totally agree. Having Michael's prior work and experience was indeed the major factor! </div><div>Merci beaucoup/Danke Michael!</div><div><br></div><div>Best Regards</div><div>Ramakrishna</div></div></div></div>