<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><span style="white-space: pre-wrap; font-size: 14px; line-height: 1.7;">At 2013-09-09 05:02:14,"Tobias Grosser" <tobias@grosser.es> wrote:</span><br><pre>>On 09/08/2013 11:46 AM, Star Tan wrote:

>> At 2013-09-02 17:05:52,"Tobias Grosser" <tobias@grosser.es> wrote:

>>

>>> On 09/01/2013 08:02 PM, Star Tan wrote:

>>>> Hi all,

>>>>

>>>>

>>>> It seems that Polly's code generation can leads to high compile-time overhead, especially for PolyBench applications such as 2mm, 3mm, gemm, syrk, etc. Some basic evaluation and analysis for Polly's code generation can be referred to  http://llvm.org/bugs/show_bug.cgi?id=16898.

>>>>

>>>>

>>>> Currently, we can choose to run -polly-code-generator=cloog or -polly-code-generator=isl for code generation, but both of them lead to almost double compile-time overhead for the 2mm benchmark. Unfortunately, both Cloog and ISL can not improve the execution time compared with -polly-code-generator=none.  I think if we could identify it will not improve execution time in advance, then we can skip the expensive Cloog and ISL code generator.

>>>>

>>>>

>>>> Can any one provide some suggestions or hints on this problem?

>>>

>>> OK. I think in this case the problem is actually to figure out why Polly

>>> does not give a speedup in terms of execution time, because we have seen

>>> large speedups for 2mm and 3mm.

>>>

>>> Here is what I see:

>>>

>>> 2mm$ polly-clang 2mm.c -O3 -I ../../../utilities/ -DPOLYBENCH_TIME

>>> -DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-ignore-aliasing

>>> 2mm$ time ./a.out

>>> 18.217128

>>>

>>> real  0m18.256s

>>> user  0m18.128s

>>> sys   0m0.064s

>>> 2mm$ polly-clang 2mm.c -O3 -I ../../../utilities/ -DPOLYBENCH_TIME

>>> -DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-ignore-aliasing -mllvm -polly

>>> 2mm$ time ./a.out

>>> 4.986877

>>>

>>> real  0m5.036s

>>> user  0m4.940s

>>> sys   0m0.068s

>>>

>>> So the reason this does not work is that the polybench kernels in the

>>> test suite do not annotate the functions called with the 'restrict'

>>> keyword (that's whe we need the ignore-aliasing) as well as that the

>>> size of the arrays is given as scalars but the corresponding loop bounds

>>> are not. It would be great to fix up those issues.

>>>

>>> The first issue can be fixed by adding run-time alias analysis checks.

>>> Adding those checks now became very easy with the new isl code

>>> generation. The basic idea is that we ask isl to generate the necessary

>>> run-time check and add it into the condition created by

>>> executeScopConditionally(). In case you are interested in looking into

>>> this, this would be a great help!

>>>

>> Thanks for your helpful reply. Yes, if we add  -polly-ignore-aliasing, which skills the aliasing checking in ScopDetection, then we can detect the kernel loop as a valid scop and gain significant performance improvement.  I tried to follow your hints to look into the executeScopConditionally() in CodeGen/Utils.cpp, but I cannot fully understand how to affect ScopDetection pass by modifying the executionScopConditionally(). Do you mean I can add ISL checking information into the Context in executionScopConditionally()? Could you give some more concrete ideas?&!

 nbsp;Is there any code examples about ISL alias analysis?

>

>The point is that we can not just skip the alias analysis check. 

>However, skipping the alias-analysis check becomes save in case we can 

>perform the necessary alias-analysis check at run-time.

>

>So the idea would be to enhance the isl code generation such that it can 

>emit a run-time check for certain cases of aliasing and to then allow 

>such cases in the SCoP detection. A simple run-time check is to

>take a set of base pointers that are in a may-alias set, and check that

>for two distinct base pointers that are part of this set, all accesses 

>can not overlap.

>

>To do this, I propose to take a simple example of two array accesses 

>with distinct base pointers that may alias and start from there. The 

>idea would be to collect for each of the base pointers all accesses that 

>use it, and to create an isl_pw_aff that is 'one' if the pointers do 

>overlap and 'zero' otherwise. You can use the isl code ast generator<

>(isl_ast_build_expr_from_pw_aff()) to create LLVM IR that performs 

>exactly this check at run-time and you can use the result of this check 

>in executeScopConditionally() to only execute the modified SCoP, if we 

>found it safe to do so.<br></pre><pre>I see, you mean we can generate LLVM code for runtime alias checking to allow more valid scops in polly-detect. <span style="font-size: 14px; line-height: 1.7;">In that case, I think it may be not easy to implement such support since the aliasing may be complex. Of course we can firstly take some </span><span style="font-size: 14px; line-height: 1.7;"> simple examples. I have added your suggestion to the original bug </span>16898 (<a href="http://llvm.org/bugs/show_bug.cgi?id=16898" style="font-size: 14px; line-height: 1.7;">http://llvm.org/bugs/show_bug.cgi?id=16898</a>) <span style="font-size: 14px; line-height: 1.7;">and I will try to move forward.</span></pre><pre>Thanks,</pre><pre>Star Tan</pre></div>