[LLVMdev] [Polly] Update of Polly compile-time performance on LLVM test-suite
tanmx_star at yeah.net
Wed Jul 31 19:28:31 PDT 2013
I have also evaluated Poly compile-time performance with our patch file for polly-dependence pass. Results can be viewed on:
With this patch file, Polly would only create a single parameter for memory accesses that share the same loop variable with different base address value. As a result, it can significantly reduce compile-time for some array-intensive benchmarks such like lu (reduced by 83.65%) and AMGMK (reduced by 56.24%).
For our standard benchmark a shown in http://llvm.org/bugs/show_bug.cgi?id=14240, the total compile-time is reduced to 0.0164s from 154.5389s. Especially, the compile-time of polly-dependence is reduced to 0.0066s (40.5%) from 148.8800s ( 96.3%).
At 2013-07-31 01:03:11,"Star Tan" <tanmx_star at yeah.net> wrote:
Hi Tobias and all Polly developers,
I have re-evaluated the Polly compile-time performance using newest LLVM/Polly source code. You can view the results on http://126.96.36.199:8000.
Especially, I also evaluated our r187102 patch file that avoids expensive failure string operations in normal execution. Specifically, I evaluated two cases for it:
Polly-NoCodeGe! n: clang -O3 -load LLVMPolly.so -mllvm -polly-optimizer=none -mllvm -polly-code-generator=none
Polly-Opt: clang -O3 -load LLVMPolly.so -mllvm -polly
<! span style="font-family: Helvetica, arial, freesans, clean, sans-serif ; font-size: 15px; line-height: 25px;">
The "Polly-NoCodeGen" case is mainly used to compare the compile-time performance for the polly-detect pass. As shown in the results, our patch file could significantly reduce the compile-time overhead for some benchmarks such as tramp3dv4 (24.2%), simple_types_constant_folding(12.6%), oggenc(9.1%), loop_unroll(7.8%)
The "Polly-opt" case is used to compare the whole compile-time performance of Polly. Since our patch file mainly affects the Polly-Detect pass, it shows similar performance to "Polly-NoCodeGen". As shown in results, it reduces the compile-time overhead of some benchmarks such as tramp3dv4 (23.7%), simple_types_constant_folding(12.9%), oggenc(8.3%), loop_unroll(7.5%)
At last, I also evaluated the performance of the ScopBottomUp patch that changes the up-down scop detection into bottom-up scop detection. Results can be viewed by:
pNoCodeGen-ScopBottomUp: clang -O3 -load LLVMPolly.so (v.s. LLVMPolly-ScopBottomUp.so) -mllvm -polly-optimizer=none -mllvm -polly-code-generator=none
pOpt-ScopBottomUp: clang -O3 -load LLVMPolly.so (v.s. LLVMPolly-ScopBottomUp.so) -mllvm -polly
(*Both of these results are based on LLVM r187116, which has included the r187102 patch file that we discussed above)
Please notice that this patch file will lead to some errors in Polly-tests, so the data shown here can not be regards as confident results. For example, this patch can significantly reduce the compile-time overhead of SingleSource/Benchmarks/Shootout/nestedloop only because it regards the nested loop as an invalid scop and skips all following transformations and optimizations. However, I evaluated it here to see its potential performance impact. Based on the results shown on http://188.8.131.52:8000/db_default/v4/nts/21?compare_to=16&baseline=16&aggregation_fn=median, we can see detecting scops bottom-up may further reduce Polly compile-time by more than 10%.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev