[LLVMdev] [FastPolly]: Update of Polly's performance on LLVM test-suite

Sun Aug 11 10:18:30 PDT 2013

On 08/10/2013 06:59 PM, Star Tan wrote:
> Hi all,
>
> I have evaluated Polly's performance on LLVM test-suite with latest LLVM (r188054) and Polly (r187981).  Results can be viewed on: http://188.40.87.11:8000.

Hi Star Tan,

thanks for the update.

> There are mainly five new tests and each test is run with 10 samples:
> clang (run id = 27):  clang -O3
> pollyBasic (run id = 28):  clang -O3 -load LLVMPolly.so
> pollyNoGen (run id = 29):  pollycc -O3 -mllvm -polly-optimizer=none -mllvm -polly-code-generator=none
> pollyNoOpt (run id = 30):  pollycc -O3 -mllvm -polly-optimizer=none
> pollyOpt (run id = 31):  pollycc -O3
 >
>
> Here is the performance comparison for the newest Polly:
>      http://188.40.87.11:8000/db_default/v4/nts/31?compare_to=18&baseline=18

It  seems the machine is down/unreachable at the moment?

> Overall, there are 198 benchmarks improved and 16 benchmarks regressed. Especially, with those recent performance-oriented patch files for ScopDetect/ScopInfo/ScopDependences/..., we have significantly reduced the compile-time overhead of Polly for a large number of benchmarks, such as:
>      SingleSource/Benchmarks/Misc/salsa20        -97.84%
>      SingleSource/Benchmarks/Polybench/linear-algebra/solvers/lu/lu        -85.01%	
>      MultiSource/Applications/obsequi/Obsequi        -57.12%
>      SingleSource/Benchmarks/Polybench/stencils/seidel-2d/seidel-2d        -50.00%
>      MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm        -40.09%
>      MultiSource/Benchmarks/mediabench/gsm/toast/toast       -39.91%
>      SingleSource/Benchmarks/Misc/whetstone       -39.02%
>      MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg       -38.07%
>      MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg       -37.70%

Very nice work!

> However, Polly can still lead to significant compile-time overhead for many benchmarks.
> As shown on:
>      http://188.40.87.11:8000/db_default/v4/nts/31?compare_to=28&baseline=28
> there are 11 benchmarks whose compile time are more than 2x than clang.  Furthermore, it seems that PollyDependence pass is still one of most expensive passes in Polly.

We need to look at these on a case by case base. 2x compile time 
increase for large programs, where Polly is just run on small parts is 
not what we want. However, for small micro kernels (e.g. Polybench) 
where we can significantly increase the performance of the generated 
code, this is in fact a good baseline - especially as we did not spend 
too much time optimising this.

> Even without optimization and code generation, Polly also increases the compile time for some benchmarks.
> As shown on:
>      http://188.40.87.11:8000/db_default/v4/nts/29?compare_to=28&baseline=28
> there are 10 benchmarks that require more than 10% extra compile-time overhead compared with clang.

Having to pay at most 10% slowdown to decide if Polly should be run 
(including all the canonicalization) is actually not bad. Especially as 
the average on normal programs is probably a lot less.

Still, if we should have a look into why this is happening for some of 
the biggest slowdowns.

Can you ping me when the server is up again. I would like to see which 
kernels are slowed down most.

> Recently, I will still focus on improving some expensive Polly passes such as PollyDependence.

Sure. Please keep me posted.

Tobi