[LLVMdev] [Polly] Update of Polly compile-time performance on LLVM test-suite

Fri Aug 2 17:28:18 PDT 2013

On 08/02/2013 07:10 AM, Star Tan wrote:
> At 2013-08-01 23:29:14,"Tobias Grosser" <tobias at grosser.es> wrote:
>
>>On 07/31/2013 09:23 PM, Star Tan wrote:
>>> At 2013-07-31 22:50:57,"Tobias Grosser" <tobias at grosser.es
[..]
>>I doubt the Polly changes changed performance a much. However, there
>>have been huge numbers of patches to LLVM/clang. Those obviously changed
>>performance. The rerun test show that our results in fact filter noise
>>out effectively. Can you check if this also holds for the original 0.01?
>
> No, it was set as 0.05 to filter out small delta.
>
> As you may need, I have reset it to 0.01 now.

Great. It seems even with 0.01 some we actually do not have a lot of noise.

http://188.40.87.11:8000/db_default/v4/nts/18?compare_to=24&baseline=24

>>>>:-( I think here an up-to-date non-polly to polly comparision would come
>>>>handy to see which benchmarks we still see larger performance
>>>>regressions. And if the bottom-up scop detection actually helps here.
>>>>As this is a larger patch, we should really have a need for it before
>>>>switching to it.
>>>>
>>> I have evaluated Polly compile-time performance for the following options:
>>>
>>>    clang: clang -O3  (runid: 14)
>>>
>>>    pBasic: clang -O3 -load LLVMPolly.so (runid:15)
>>>
>>>    pNoGen: pollycc -O3 -mllvm -polly-optimizer=none -mllvm -polly-code-generator=none (runid:16)
>>>
>>>    pNoOpt: pollycc -O3 -mllvm -polly-optimizer=none (runid:17)
>>>
>>>    pOpt: pollycc -O3 (runid:18)
>>>
>>> For example, you can view the comparison between "clang" and "pNoGen" with:
>>>http://188.40.87.11:8000/db_default/v4/nts/16?compare_to=14&baseline=14
>>>
>>> It shows that without optimizer and code generator, Polly would lead to less then 30% extra compile-time overhead.
>>
>>This is a step in the right direction, especially as most runs show a
>>lot less overhead.
>
> Yes, but this is based on the fact that I set the ignore_small threshold to 0.05 from the original 0.01.
>
> If the ignore_small threshold is set to 0.01, then some benchmarks show more than 30% compile-time overhead.
>
> As I said above, I have reset the ignore_small threshold to 0.01. Now you can view them with the same URL:
>
> http://188.40.87.11:8000/db_default/v4/nts/16?compare_to=14&baseline=14

We probably want to look at 
http://188.40.87.11:8000/db_default/v4/nts/16?compare_to=15&baseline=15 
which just measures the difference between enabling and disabling polly, 
but not overhead introduced due to
loading a shared object file.

>>> For the execution performance, it is interesting that pNoGen not only significantly improves the execution performance for some benchmarks (nestedloop/huffbench) but also significantly reduces the execution performance for another set of benchmarks (gcc-loops/lpbench).
>>
>>Yes, that is really interesting. I suspect a couple of our
>>canonicalization passes enabled/blocked additional optimizations in
>>LLVM. The huffbench kernel seems especially interesting. This is not
>>your number one priority in GSoC, but understanding why the gcc-loops
>>got so much worse may be interesting. I suspect this may some kind of
>>generic LLVM issue we expose and we should report a bug explaining the
>>issue.
>>
> Certainly,  I would try to investigate the huffbench after I commit the patch file for ScopInfo in recent days.

Great. gcc-loops may also be interesting to look at (especially as it is 
an unwanted regression that we would like to fix).

Cheers,
Tobi