[LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass

Thu Jul 25 23:14:51 PDT 2013

On 07/25/2013 09:01 PM, Star Tan wrote:
> Hi Sebastian,
>
>
> Recently, I found the "Polly - Calculate dependences" pass would lead to significant compile-time overhead when compiling some loop-intensive source code. Tobias told me you found similar problem as follows:
> http://llvm.org/bugs/show_bug.cgi?id=14240
>
>
> My evaluation shows that "Polly - Calculate dependences" pass consumes 96.4% of total compile-time overhead when compiling the program you proposed. It seems that the expensive compile-time overhead comes from those loop operations in ISL library. Some preliminary results can be seen on http://llvm.org/bugs/show_bug.cgi?id=14240.
>
>
> Sebastian, I wonder whether you have further results or suggestions besides those words posted on the Bugzilla page? Any information or suggestion would be appreciated.

Hi Star Tan,

thanks for looking into this. Just to sum up, the test case that
shows this slowness is the following:

void bar();
void foo(short *input) {
   short x0, x1;
   char i, ctr;

   for(i = 0; i < 8; i++) {

     // SCoP begin
     for (ctr = 0; ctr < 8; ctr++) {
       x0 = input[i*64 + ctr*8 + 0] ;
       x1 = input[i*64 + ctr*8 + 1] ;

       input[i*64 + ctr*8 + 0] = x0 - x1;
       input[i*64 + ctr*8 + 1] = x0 + x1;
       input[i*64 + ctr*8 + 2] = x0 * x1;
     }
     // SCoP end

     bar(); // Unknown function call stops further expansion of SCoP
   }
}

Which is translated to the following scop:

     Context:
     [p_0, p_1, p_2] -> {  : p_0 >= -2147483648 and p_0 <= 2147483647 
and p_1 >= -2147483648 and p_1 <= 2147483647 and p_2 >= -2147483648 and 
p_2 <= 2147483647 }
     p_0: {0,+,128}<%for.cond2.preheader>
     p_1: {2,+,128}<%for.cond2.preheader>
     p_2: {4,+,128}<%for.cond2.preheader>
[...]

The interesting observation is, that Polly introduces three parameters 
(p_0, p_1, p_2) for this SCoP, even though in the C source code only the 
variable 'i' is SCoP invariant. However, due to the way 
SCEVExpr(essions) in LLVM are nested, Polly sees three scop-invariant 
SCEVExpr(essions) which are all translated into independent parameters.
However, as we can see, the only difference between the three
parameters is a different constant in the base of the AddRecExpr. If we 
would just introduce p_0 (the parameter where the scev-base is zero) and 
express any use of p_1 as p_0 + 2 and p_2 as p_0 + 4, isl could solve 
this problem very quickly.
There are other ways to improve performance which include further tuning 
isl or reducing the precision of the analysis, but in this case
I don't think we need to look into them. The above fix should give us 
good performance and additionally also increases the precision of the 
result (as isl now understands the relation between the different 
parameters).

To fix this, you probably would like to look into the SCEVAffinator 
class and the way parameters are handled.

Cheers,
Tobias