[LLVMdev] [Polly] Compile-time of Polly's code generation

Sun Sep 8 19:42:12 PDT 2013

At 2013-09-09 05:02:14,"Tobias Grosser" <tobias at grosser.es> wrote:

>On 09/08/2013 11:46 AM, Star Tan wrote:
>> At 2013-09-02 17:05:52,"Tobias Grosser" <tobias at grosser.es> wrote:
>>
>>> On 09/01/2013 08:02 PM, Star Tan wrote:
>>>> Hi all,
>>>>
>>>>
>>>> It seems that Polly's code generation can leads to high compile-time overhead, especially for PolyBench applications such as 2mm, 3mm, gemm, syrk, etc. Some basic evaluation and analysis for Polly's code generation can be referred to  http://llvm.org/bugs/show_bug.cgi?id=16898.
>>>>
>>>>
>>>> Currently, we can choose to run -polly-code-generator=cloog or -polly-code-generator=isl for code generation, but both of them lead to almost double compile-time overhead for the 2mm benchmark. Unfortunately, both Cloog and ISL can not improve the execution time compared with -polly-code-generator=none.  I think if we could identify it will not improve execution time in advance, then we can skip the expensive Cloog and ISL code generator.
>>>>
>>>>
>>>> Can any one provide some suggestions or hints on this problem?
>>>
>>> OK. I think in this case the problem is actually to figure out why Polly
>>> does not give a speedup in terms of execution time, because we have seen
>>> large speedups for 2mm and 3mm.
>>>
>>> Here is what I see:
>>>
>>> 2mm$ polly-clang 2mm.c -O3 -I ../../../utilities/ -DPOLYBENCH_TIME
>>> -DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-ignore-aliasing
>>> 2mm$ time ./a.out
>>> 18.217128
>>>
>>> real	0m18.256s
>>> user	0m18.128s
>>> sys	0m0.064s
>>> 2mm$ polly-clang 2mm.c -O3 -I ../../../utilities/ -DPOLYBENCH_TIME
>>> -DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-ignore-aliasing -mllvm -polly
>>> 2mm$ time ./a.out
>>> 4.986877
>>>
>>> real	0m5.036s
>>> user	0m4.940s
>>> sys	0m0.068s
>>>
>>> So the reason this does not work is that the polybench kernels in the
>>> test suite do not annotate the functions called with the 'restrict'
>>> keyword (that's whe we need the ignore-aliasing) as well as that the
>>> size of the arrays is given as scalars but the corresponding loop bounds
>>> are not. It would be great to fix up those issues.
>>>
>>> The first issue can be fixed by adding run-time alias analysis checks.
>>> Adding those checks now became very easy with the new isl code
>>> generation. The basic idea is that we ask isl to generate the necessary
>>> run-time check and add it into the condition created by
>>> executeScopConditionally(). In case you are interested in looking into
>>> this, this would be a great help!
>>>
>> Thanks for your helpful reply. Yes, if we add  -polly-ignore-aliasing, which skills the aliasing checking in ScopDetection, then we can detect the kernel loop as a valid scop and gain significant performance improvement.  I tried to follow your hints to look into the executeScopConditionally() in CodeGen/Utils.cpp, but I cannot fully understand how to affect ScopDetection pass by modifying the executionScopConditionally(). Do you mean I can add ISL checking information into the Context in executionScopConditionally()? Could you give some more concrete ideas? Is there any code examples about ISL alias analysis?
>
>The point is that we can not just skip the alias analysis check. 
>However, skipping the alias-analysis check becomes save in case we can 
>perform the necessary alias-analysis check at run-time.
>
>So the idea would be to enhance the isl code generation such that it can 
>emit a run-time check for certain cases of aliasing and to then allow 
>such cases in the SCoP detection. A simple run-time check is to
>take a set of base pointers that are in a may-alias set, and check that
>for two distinct base pointers that are part of this set, all accesses 
>can not overlap.
>
>To do this, I propose to take a simple example of two array accesses 
>with distinct base pointers that may alias and start from there. The 
>idea would be to collect for each of the base pointers all accesses that 
>use it, and to create an isl_pw_aff that is 'one' if the pointers do 
>overlap and 'zero' otherwise. You can use the isl code ast generator<
>(isl_ast_build_expr_from_pw_aff()) to create LLVM IR that performs 
>exactly this check at run-time and you can use the result of this check 
>in executeScopConditionally() to only execute the modified SCoP, if we 
>found it safe to do so.

I see, you mean we can generate LLVM code for runtime alias checking to allow more valid scops in polly-detect. In that case, I think it may be not easy to implement such support since the aliasing may be complex. Of course we can firstly take some  simple examples. I have added your suggestion to the original bug 16898 (http://llvm.org/bugs/show_bug.cgi?id=16898) and I will try to move forward.
Thanks,
Star Tan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130909/09a8c879/attachment.html>