At 2013-08-12 01:18:30,"Tobias Grosser" <tobias@grosser.es> wrote:<br>>On 08/10/2013 06:59 PM, Star Tan wrote:<br>>> Hi all,<br>>><br>>> I have evaluated Polly's performance on LLVM test-suite with latest LLVM (r188054) and Polly (r187981).  Results can be viewed on: http://188.40.87.11:8000.<br>><br>>Hi Star Tan,<br>><br>>thanks for the update.<br>><br>>> There are mainly five new tests and each test is run with 10 samples:<br>>> clang (run id = 27):  clang -O3<br>>> pollyBasic (run id = 28):  clang -O3 -load LLVMPolly.so<br>>> pollyNoGen (run id = 29):  pollycc -O3 -mllvm -polly-optimizer=none -mllvm -polly-code-generator=none<br>>> pollyNoOpt (run id = 30):  pollycc -O3 -mllvm -polly-optimizer=none<br>>> pollyOpt (run id = 31):  pollycc -O3<br>> ><br>>><br>>> Here is the performance comparison for the newest Polly:<br>>>      http://188.40.87.11:8000/db_default/v4/nts/31?compare_to=18&baseline=18<br>><br>>It  seems the machine is down/unreachable at the moment?<br><br>I restart the LNT server. It is available now.<br><br>>> Overall, there are 198 benchmarks improved and 16 benchmarks regressed. Especially, with those recent performance-oriented patch files for ScopDetect/ScopInfo/ScopDependences/..., we have significantly reduced the compile-time overhead of Polly for a large number of benchmarks, such as:<br>>>      SingleSource/Benchmarks/Misc/salsa20        -97.84%<br>>>      SingleSource/Benchmarks/Polybench/linear-algebra/solvers/lu/lu        -85.01%    <br>>>      MultiSource/Applications/obsequi/Obsequi        -57.12%<br>>>      SingleSource/Benchmarks/Polybench/stencils/seidel-2d/seidel-2d        -50.00%<br>>>      MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm        -40.09%<br>>>      MultiSource/Benchmarks/mediabench/gsm/toast/toast       -39.91%<br>>>      SingleSource/Benchmarks/Misc/whetstone       -39.02%<br>>>      MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg       -38.07%<br>>>      MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg       -37.70%<br>><br>>Very nice work!<br>><br>>> However, Polly can still lead to significant compile-time overhead for many benchmarks.<br>>> As shown on:<br>>>      http://188.40.87.11:8000/db_default/v4/nts/31?compare_to=28&baseline=28<br>>> there are 11 benchmarks whose compile time are more than 2x than clang.  Furthermore, it seems that PollyDependence pass is still one of most expensive passes in Polly.<br>><br>>We need to look at these on a case by case base. 2x compile time <br>>increase for large programs, where Polly is just run on small parts is <br>>not what we want. However, for small micro kernels (e.g. Polybench) <br>>where we can significantly increase the performance of the generated <br>>code, this is in fact a good baseline - especially as we did not spend <br>>too much time optimising this.<br><br>Yes, we should look into the compile-execution performance trade-off. <br>I have summarized some benchmarks (compile-time overhead is more than 200%) as follows:<br><br>SingleSource/Benchmarks/Shootout/nestedloop,<br>    compile_time(+6355.56%), execution_time(-99.21%)<br>SingleSource/Benchmarks/Polybench/stencils/seidel-2d/seidel-2d,<br>    compile_time(+1275.00%), execution_time (0%)<br>SingleSource/Benchmarks/Shootout-C++/nestedloop,<br>    compile_time(+1155.56%), execution_time(-99.23%)<br>MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk,<br>    compile_time(+491.80%), execution_time (0%)<br>SingleSource/UnitTests/Vector/multiplies,<br>    compile_time(+350.00%), execution_time(-13.64%)<br>SingleSource/Benchmarks/Stanford/Puzzle,<br>    compile_time(+345.45%), execution_time(-40.91%)<br>SingleSource/Benchmarks/Polybench/linear-algebra/kernels/2mm/2mm,<br>    compile_time(+278.95%), execution_time(0%)<br>SingleSource/Benchmarks/Polybench/linear-algebra/kernels/3mm/3mm,<br>    compile_time(+270.73%), execution_time(0%)<br>SingleSource/Benchmarks/Polybench/linear-algebra/kernels/syrk/syrk,<br>    compile_time(+208.57%), execution_time(0%)<br>SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gemm/gemm,<br>    compile_time(+202.63%), execution_time(0%)<br>SingleSource/Regression/C/test_indvars,<br>    compile_time(+200.00%), execution_time(0%)<br><br>Results show that some Polly leads to significant compile-time overhead without any execution performance improvement.<br>I have reported a bug for nestedloop (http://llvm.org/bugs/show_bug.cgi?id=16843), and I would reported other bugs for those benchmarks whose compile time is significantly increased but without execution performance improvement.<br><br>Furthermore, you can view top 10 compiler passes when compiling with Polly as follows:<br>https://gist.github.com/tanstar/581bcea1e4e03498f935/raw/f6a4ec4e8565f7a7bbdb924cd59fcf145caac039/Polly-top10<br><br>>> Even without optimization and code generation, Polly also increases the compile time for some benchmarks.<br>>> As shown on:<br>>>      http://188.40.87.11:8000/db_default/v4/nts/29?compare_to=28&baseline=28<br>>> there are 10 benchmarks that require more than 10% extra compile-time overhead compared with clang.<br>><br>>Having to pay at most 10% slowdown to decide if Polly should be run <br>>(including all the canonicalization) is actually not bad. Especially as <br>>the average on normal programs is probably a lot less.<br>><br>>Still, if we should have a look into why this is happening for some of <br>>the biggest slowdowns.<br>><br>>Can you ping me when the server is up again. I would like to see which <br>>kernels are slowed down most.<br><br>The server is up now.<br><br>For you information, you can view top 10 compiler passes when compiled with "polly without optimization and code generation" as follows:<br>    https://gist.github.com/tanstar/40ece0e4e2bf9d052ca0/raw/9e892fe50544acca9609004941d4b3d4921cb302/Polly-NoGenOpt-top10<br><br>I have checked some benchmarks. It seems that the extra compile-time overhead is mainly resutled by the following Polly passes:<br>    Polly - Create polyhedral description of Scops<br>    Combine redundant instructions<br>    Polly - Detect static control parts (SCoPs)<br>    Induction Variable Simplification (Polly version)<br><br>Cheers,<br>Star Tan