[LLVMdev] [Polly] Parallelizing outer loop containing inner reduction loop

Tobias Grosser tobias at grosser.es
Mon Feb 4 07:24:04 PST 2013


Hi Dmitry,

[FORTRAN code]

>     The difference between two versions is only in the way how inner
>     reduction loop is implemented. In first case it is explicit and in
>     second case - uses sum intrinsic, which is lowered to plain code by
>     DragonEgg. In first case Polly successfully detects parallel outer
>     loop, in second case - it can't.

Thanks for the test case.

>     After some attempts to modify the Polly behavior, I concluded it
>     cannot resist the presence of non-IV PHI-node. Polly attempts to
>     produce independent basic blocks, trying to gather dependent
>     instructions in single block and allocating scratch variables.
>     Allocas make the situation even worse, because they are always
>     placed into the first block of function, completely killing
>     potential parallelism.

As a first step, I was trying to reproduce your observations. When 
optimizing the kernels with polly, possible aliasing prevents polly to 
detect the scop.

$ polly-opt -O3 -polly 1.ll -debug-only=polly-detect

Checking region: Loop Function Root => <Function Return>
	Top level region is invalid
Checking region: 3.cloned.header => return.loopexit.exitStub
	Possible aliasing: "[0 x double]* inttoptr (i64 47255179264 to [0 x 
double]*)", "[0 x double]* inttoptr (i64 47246786560 to [0 x double]*)"
Checking region: 5.cloned => 6.cloned
	Possible aliasing: "[0 x double]* inttoptr (i64 47246786560 to [0 x 
double]*)", "[0 x double]* inttoptr (i64 47246749696 to [0 x double]*)"

$ polly-opt -O3 -polly 2.ll -debug-only=polly-detect
Checking region: Loop Function Root => <Function Return>
	Top level region is invalid
Checking region: 3.cloned.header => return.loopexit.exitStub
	Possible aliasing: "[0 x double]* inttoptr (i64 47255179264 to [0 x 
double]*)", "[0 x double]* inttoptr (i64 47246786560 to [0 x double]*)"
Checking region: 4.cloned => 5.cloned
	Possible aliasing: "[0 x double]* inttoptr (i64 47255179264 to [0 x 
double]*)", "[0 x double]* inttoptr (i64 47246786560 to [0 x double]*)"

The problematic aliasing is due to these inttoptr expressions to some 
fixed based addresses. The basic-alias-analysis can not analyze them at 
all. The scev-alias-analysis understands that the base addresses can not 
alias, however it is impossible to prove complete independence of
any derived address. Hence, to reason about the addresses we would need
to teach Polly to understand that there is no aliasing for the subset of 
the memory space that is accessed within the Scop.

In any case, you seemed to have in some way convinced Polly to accept 
this code. Would you mind to share what you did?


Regarding your problem. As far as I understand, the problem is that the 
following code:

for (i
   A[i] = 0
   for (j
       A[i] +=
   ... = A[i]

is changed by gcc (and other optimizations) to:

for (i
   A[i] = 0
   tmp = A[i]

   for (j
       tmp +=

   A[i] = tmp
   ... = A[i]

This is a common optimization that unfortunately introduces a lot of 
dependences on tmp that block any direct parallelization. To parallelize 
the loop anyway we would need to expand the memory of tmp, such that 
each parallel thread has a private copy of 'tmp'. Deciding where and how 
to expand memory to enable further transformations is in general 
difficult such that I would normally run Polly before such optimizations 
are performed. Tough, in your case it may still be possible to 
parallelize the loop. To do this, you would need to ignore all 
dependences that can be removed by creating thread private copies of 
'tmp'. If you are interested in having this implemented either open a 
bug report or give it a try yourself. I am happy to assist.

Cheers
Tobi



More information about the llvm-dev mailing list