<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD>

<META content="text/html; charset=gb2312" http-equiv=Content-Type>

<STYLE>

BLOCKQUOTE {

        MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px; MARGIN-LEFT: 2em

}

OL {

        MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px

}

UL {

        MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px

}

BODY {

        LINE-HEIGHT: 1.5; FONT-FAMILY: verdana; COLOR: #000000; FONT-SIZE: 10pt

}

P {

        MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px

}

</STYLE>


<META name=GENERATOR content="MSHTML 9.00.8112.16464"></HEAD>

<BODY style="MARGIN: 10px">

<DIV style="FONT-FAMILY: Verdana">Dear Tobies,</DIV>

<DIV style="FONT-FAMILY: Verdana"> </DIV>

<DIV style="FONT-FAMILY: Verdana"></DIV>

<DIV style="FONT-FAMILY: Verdana">Sorry for the late reply. </DIV>

<DIV style="FONT-FAMILY: Verdana"> </DIV>

<DIV style="FONT-FAMILY: Verdana"></DIV>

<DIV style="FONT-FAMILY: Verdana">I have checked the experiment and I found some 

of the data is mismatched because of incorrect manual copy and paste, so I have 

written a Shell script to automatically collect data. Newest data is listed in 

the attached file.</DIV>

<DIV style="FONT-FAMILY: Verdana"> </DIV>

<DIV style="FONT-FAMILY: Verdana"></DIV>

<DIV style="FONT-FAMILY: Verdana">Tobies, I have made a simple HTML page 

(attached polly-compiling-overhead.html) to show the experimental data and my 

plans for this project. I think a public webpage can be helpful for our further 

discussion. If possible, could you put it on Polly website (Either a public link 

or a temporary webpage) ? </DIV>

<DIV style="FONT-FAMILY: Verdana"></DIV>

<DIV style="FONT-FAMILY: Verdana">I think I will try to remove unnecessary code 

transformations for canonicalization in next step.</DIV>

<DIV style="FONT-FAMILY: Verdana"> </DIV>

<DIV></DIV>

<DIV>Thank you very much for your warm help.</DIV>

<DIV> </DIV>

<DIV></DIV>

<DIV>Best Regards,</DIV>

<DIV>Star Tan</DIV>

<DIV></DIV>

<DIV style="FONT-FAMILY: Verdana"> </DIV>

<DIV> </DIV>

<DIV 

style="BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0cm; PADDING-LEFT: 0cm; PADDING-RIGHT: 0cm; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt">

<DIV 

style="PADDING-BOTTOM: 8px; PADDING-LEFT: 8px; PADDING-RIGHT: 8px; BACKGROUND: #efefef; COLOR: #000000; FONT-SIZE: 12px; PADDING-TOP: 8px">

<DIV><B>From:</B> <A href="mailto:tobias@grosser.es">Tobias 

Grosser</A></DIV>

<DIV><B>Date:</B> 2013-03-20 21:06</DIV>

<DIV><B>To:</B> <A href="mailto:tanmx_star@yeah.net">Star Tan</A></DIV>

<DIV><B>CC:</B> <A href="mailto:llvmdev@cs.uiuc.edu">llvmdev</A></DIV>

<DIV><B>Subject:</B> Re: [Polly]GSoC Proposal: Reducing LLVM-Polly 

Compiling overhead</DIV></DIV></DIV>

<DIV>

<DIV>On 03/19/2013 11:02 AM, Star Tan wrote:</DIV>

<DIV>></DIV>

<DIV>> Dear Tobias Grosser,</DIV>

<DIV>></DIV>

<DIV>> Today I have rebuilt the LLVM-Polly in Release mode. The configuration of my own testing machine is: Intel Pentium Dual CPU T2390(1.86GHz) with 2GB DDR2 memory.</DIV>

<DIV>> I evaluated the Polly using PolyBench and Mediabench. It takes too long time to evaluate the whole LLVM-testsuite, so I just choose the Mediabench from LLVM-testsuite.</DIV>

<DIV> </DIV>

<DIV>OK. This is a good baseline.</DIV>

<DIV> </DIV>

<DIV>> The preliminary results of Polly compiling overhead is listed as follows:</DIV>

<DIV>></DIV>

<DIV>> Table 1: Compiling time overhead of Polly for PolyBench.</DIV>

<DIV>></DIV>

<DIV>> | | Clang</DIV>

<DIV>> (econd) | Polly-load</DIV>

<DIV>> (econd) | Polly-optimize</DIV>

<DIV>> (econd) | Polly-load penalty | Polly-optimize</DIV>

<DIV>> Penalty |</DIV>

<DIV>> | 2mm.c | 0.155 | 0.158 | 0.75 | 1.9% | 383.9% |</DIV>

<DIV>> | correlation.c | 0.132 | 0.133 | 0.319 | 0.8% | 141.7% |</DIV>

<DIV>> | geummv.c | 0.152 | 0.157 | 0.794 | 3.3% | 422.4% |</DIV>

<DIV>> | ludcmp.c | 0.157 | 0.159 | 0.391 | 1.3% | 149.0% |</DIV>

<DIV>> | 3mm.c | 0.103 | 0.109 | 0.122 | 5.8% | 18.4% |</DIV>

<DIV>> | covariance.c | 0.16 | 0.163 | 1.346 | 1.9% | 741.3% |</DIV>

<DIV> </DIV>

<DIV>This is a very large slowdown. On my system I get</DIV>

<DIV> </DIV>

<DIV>0.06 sec for Polly-load</DIV>

<DIV>0.09 sec for Polly-optimize</DIV>

<DIV> </DIV>

<DIV>What exact version of Polybench did you use? What compiler</DIV>

<DIV>flags did you use to compile the benchmark?</DIV>

<DIV>Also, did you run the executables several times? How large is the</DIV>

<DIV>standard deviation of the results? (You can use a tool like ministat to </DIV>

<DIV>calculate these values [1])</DIV>

<DIV> </DIV>

<DIV>> | gramchmidt.c | 0.159 | 0.167 | 1.023 | 5.0% | 543.4% |</DIV>

<DIV>> | eidel.c | 0.125 | 0.13 | 0.285 | 4.0% | 128.0% |</DIV>

<DIV>> | adi.c | 0.155 | 0.156 | 0.953 | 0.6% | 514.8% |</DIV>

<DIV>> | doitgen.c | 0.124 | 0.128 | 0.298 | 3.2% | 140.3% |</DIV>

<DIV>> | intrument.c | 0.149 | 0.151 | 0.837 | 1.3% | 461.7% |</DIV>

<DIV> </DIV>

<DIV>This number is surprising. In your last numbers you reported </DIV>

<DIV>Polly-optimize as taking 0.495 sec in debug mode. The time you now</DIV>

<DIV>report for the release mode is almost twice as much. Can you verify</DIV>

<DIV>this number please?</DIV>

<DIV> </DIV>

<DIV>> | atax.c | 0.135 | 0.136 | 0.917 | 0.7% | 579.3% |</DIV>

<DIV>> | gemm.c | 0.161 | 0.162 | 1.839 | 0.6% | 1042.2% |</DIV>

<DIV> </DIV>

<DIV>This number looks also fishy. In debug mode you reported for </DIV>

<DIV>Polly-optimize 1.327 seconds. This is again faster than in release mode.</DIV>

<DIV> </DIV>

<DIV>> | jacobi-2d-imper.c | 0.16 | 0.161 | 0.649 | 0.6% | 305.6% |</DIV>

<DIV>> | bicg.c | 0.149 | 0.152 | 0.444 | 2.0% | 198.0% |</DIV>

<DIV>> | gemver.c | 0.135 | 0.136 | 0.416 | 0.7% | 208.1% |</DIV>

<DIV>> | lu.c | 0.143 | 0.148 | 0.398 | 3.5% | 178.3% |</DIV>

<DIV>> | Average | | | | 2.20% | 362.15% |</DIV>

<DIV> </DIV>

<DIV>Otherwise, those numbers look like a good start. Maybe you can put them</DIV>

<DIV>on some website/wiki/document where you can extend them as you proceed </DIV>

<DIV>with benchmarking.</DIV>

<DIV> </DIV>

<DIV>> Table 2: Compiling time overhead of Polly for Mediabench (Selected from LLVM-testsuite).</DIV>

<DIV>> | | Clang</DIV>

<DIV>> (econd) | Polly-load</DIV>

<DIV>> (econd) | Polly-optimize</DIV>

<DIV>> (econd) | Polly-load penalty | Polly-optimize</DIV>

<DIV>> Penalty |</DIV>

<DIV>> | adpcm | 0.18 | 0.187 | 0.218 | 3.9% | 21.1% |</DIV>

<DIV>> | g721 | 0.538 | 0.538 | 0.803 | 0.0% | 49.3% |</DIV>

<DIV>> | gsm | 2.869 | 2.936 | 4.789 | 2.3% | 66.9% |</DIV>

<DIV>> | mpeg2 | 3.026 | 3.072 | 4.662 | 1.5% | 54.1% |</DIV>

<DIV>> | jpeg | 13.083 | 13.248 | 22.488 | 1.3% | 71.9% |</DIV>

<DIV>> | Average | | | | 1.80% | 52.65% |</DIV>

<DIV> </DIV>

<DIV> </DIV>

<DIV>I run jpeg myself to verify these numbers on my machine. I got:</DIV>

<DIV> </DIV>

<DIV>A: -O3</DIV>

<DIV>B: -O3 -load LLVMPolly.so</DIV>

<DIV>C: -O3 -load LLVMPolly.so -mllvm -polly</DIV>

<DIV>D: -O3 -load LLVMPolly.so -mllvm -polly -mllvm -polly-optimizer=none</DIV>

<DIV>E: -O3 -load LLVMPolly.so -mllvm -polly -mllvm -polly-optimizer=none</DIV>

<DIV>    -mllvm -polly-code-generator=none</DIV>

<DIV> </DIV>

<DIV>           A     B     C     D     E</DIV>

<DIV>| jpeg | 5.1 | 5.2 | 8.0 | 7.9 | 5.5</DIV>

<DIV> </DIV>

<DIV>The overhead between A and C is similar to the one you report. Hence, </DIV>

<DIV>the numbers seem to be correct.</DIV>

<DIV> </DIV>

<DIV>I also added two more runs D and E to figure out where the slowdown </DIV>

<DIV>comes from. As you can see most of the slow down disappears when we</DIV>

<DIV>do not do code generation. This either means that the polly code </DIV>

<DIV>generation itself is slow or that the LLVM passes afterwards need more</DIV>

<DIV>time due to the code we generated (it contains many opportunities for </DIV>

<DIV>scalar simplifications). It would be interesting to see if this holds </DIV>

<DIV>for the other benchmarks and to investigate the actual reasons for the </DIV>

<DIV>slowdown. It is also interesting to see that just running Polly, but </DIV>

<DIV>without applying optimizations does not slow down the compilation a lot. </DIV>

<DIV>Does this also hold for other benchmarks?</DIV>

<DIV> </DIV>

<DIV>> As shown in these two tables, Polly can significantly increase the compiling time when it indeed works for the Polybench. On average, Polly will increase the compiling time by 4.5X for Polybench.  Even for the Mediabench, in which Polly does not actually improve the efficiency of generated code, it still increases the compiling time by 1.5X.</DIV>

<DIV>> Based on the above observation, I think we should not only reduce the Polly analysis and optimization time, but also make it bail out early when it cannot improve the efficiency of generated code. That is very important when Polly is enabled in default for LLVM users.</DIV>

<DIV> </DIV>

<DIV>Bailing out early is definitely something we can think about.</DIV>

<DIV> </DIV>

<DIV>To get started here, you could e.g. look into the jpeg benchmark and </DIV>

<DIV>investigate on which files Polly is spending a lot of time, where </DIV>

<DIV>exactly the time is spent and what kind of SCoPs Polly is optimizing. In </DIV>

<DIV>case we do not expect any benefit, we may skip code generation entirely.</DIV>

<DIV> </DIV>

<DIV>Thanks again for your interesting analysis.</DIV>

<DIV> </DIV>

<DIV>Cheers,</DIV>

<DIV>Tobi</DIV>

<DIV> </DIV>

<DIV>[1] https://github.com/codahale/ministat</DIV></DIV></BODY></HTML>