[LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass
Tobias Grosser
tobias at grosser.es
Thu Jul 25 23:14:51 PDT 2013
On 07/25/2013 09:01 PM, Star Tan wrote:
> Hi Sebastian,
>
>
> Recently, I found the "Polly - Calculate dependences" pass would lead to significant compile-time overhead when compiling some loop-intensive source code. Tobias told me you found similar problem as follows:
> http://llvm.org/bugs/show_bug.cgi?id=14240
>
>
> My evaluation shows that "Polly - Calculate dependences" pass consumes 96.4% of total compile-time overhead when compiling the program you proposed. It seems that the expensive compile-time overhead comes from those loop operations in ISL library. Some preliminary results can be seen on http://llvm.org/bugs/show_bug.cgi?id=14240.
>
>
> Sebastian, I wonder whether you have further results or suggestions besides those words posted on the Bugzilla page? Any information or suggestion would be appreciated.
Hi Star Tan,
thanks for looking into this. Just to sum up, the test case that
shows this slowness is the following:
void bar();
void foo(short *input) {
short x0, x1;
char i, ctr;
for(i = 0; i < 8; i++) {
// SCoP begin
for (ctr = 0; ctr < 8; ctr++) {
x0 = input[i*64 + ctr*8 + 0] ;
x1 = input[i*64 + ctr*8 + 1] ;
input[i*64 + ctr*8 + 0] = x0 - x1;
input[i*64 + ctr*8 + 1] = x0 + x1;
input[i*64 + ctr*8 + 2] = x0 * x1;
}
// SCoP end
bar(); // Unknown function call stops further expansion of SCoP
}
}
Which is translated to the following scop:
Context:
[p_0, p_1, p_2] -> { : p_0 >= -2147483648 and p_0 <= 2147483647
and p_1 >= -2147483648 and p_1 <= 2147483647 and p_2 >= -2147483648 and
p_2 <= 2147483647 }
p_0: {0,+,128}<%for.cond2.preheader>
p_1: {2,+,128}<%for.cond2.preheader>
p_2: {4,+,128}<%for.cond2.preheader>
[...]
The interesting observation is, that Polly introduces three parameters
(p_0, p_1, p_2) for this SCoP, even though in the C source code only the
variable 'i' is SCoP invariant. However, due to the way
SCEVExpr(essions) in LLVM are nested, Polly sees three scop-invariant
SCEVExpr(essions) which are all translated into independent parameters.
However, as we can see, the only difference between the three
parameters is a different constant in the base of the AddRecExpr. If we
would just introduce p_0 (the parameter where the scev-base is zero) and
express any use of p_1 as p_0 + 2 and p_2 as p_0 + 4, isl could solve
this problem very quickly.
There are other ways to improve performance which include further tuning
isl or reducing the precision of the analysis, but in this case
I don't think we need to look into them. The above fix should give us
good performance and additionally also increases the precision of the
result (as isl now understands the relation between the different
parameters).
To fix this, you probably would like to look into the SCEVAffinator
class and the way parameters are handled.
Cheers,
Tobias
More information about the llvm-dev
mailing list