[LLVMdev] [Polly] Parallelizing outer loop containing inner reduction loop
Tobias Grosser
tobias at grosser.es
Mon Feb 4 07:24:04 PST 2013
Hi Dmitry,
[FORTRAN code]
> The difference between two versions is only in the way how inner
> reduction loop is implemented. In first case it is explicit and in
> second case - uses sum intrinsic, which is lowered to plain code by
> DragonEgg. In first case Polly successfully detects parallel outer
> loop, in second case - it can't.
Thanks for the test case.
> After some attempts to modify the Polly behavior, I concluded it
> cannot resist the presence of non-IV PHI-node. Polly attempts to
> produce independent basic blocks, trying to gather dependent
> instructions in single block and allocating scratch variables.
> Allocas make the situation even worse, because they are always
> placed into the first block of function, completely killing
> potential parallelism.
As a first step, I was trying to reproduce your observations. When
optimizing the kernels with polly, possible aliasing prevents polly to
detect the scop.
$ polly-opt -O3 -polly 1.ll -debug-only=polly-detect
Checking region: Loop Function Root => <Function Return>
Top level region is invalid
Checking region: 3.cloned.header => return.loopexit.exitStub
Possible aliasing: "[0 x double]* inttoptr (i64 47255179264 to [0 x
double]*)", "[0 x double]* inttoptr (i64 47246786560 to [0 x double]*)"
Checking region: 5.cloned => 6.cloned
Possible aliasing: "[0 x double]* inttoptr (i64 47246786560 to [0 x
double]*)", "[0 x double]* inttoptr (i64 47246749696 to [0 x double]*)"
$ polly-opt -O3 -polly 2.ll -debug-only=polly-detect
Checking region: Loop Function Root => <Function Return>
Top level region is invalid
Checking region: 3.cloned.header => return.loopexit.exitStub
Possible aliasing: "[0 x double]* inttoptr (i64 47255179264 to [0 x
double]*)", "[0 x double]* inttoptr (i64 47246786560 to [0 x double]*)"
Checking region: 4.cloned => 5.cloned
Possible aliasing: "[0 x double]* inttoptr (i64 47255179264 to [0 x
double]*)", "[0 x double]* inttoptr (i64 47246786560 to [0 x double]*)"
The problematic aliasing is due to these inttoptr expressions to some
fixed based addresses. The basic-alias-analysis can not analyze them at
all. The scev-alias-analysis understands that the base addresses can not
alias, however it is impossible to prove complete independence of
any derived address. Hence, to reason about the addresses we would need
to teach Polly to understand that there is no aliasing for the subset of
the memory space that is accessed within the Scop.
In any case, you seemed to have in some way convinced Polly to accept
this code. Would you mind to share what you did?
Regarding your problem. As far as I understand, the problem is that the
following code:
for (i
A[i] = 0
for (j
A[i] +=
... = A[i]
is changed by gcc (and other optimizations) to:
for (i
A[i] = 0
tmp = A[i]
for (j
tmp +=
A[i] = tmp
... = A[i]
This is a common optimization that unfortunately introduces a lot of
dependences on tmp that block any direct parallelization. To parallelize
the loop anyway we would need to expand the memory of tmp, such that
each parallel thread has a private copy of 'tmp'. Deciding where and how
to expand memory to enable further transformations is in general
difficult such that I would normally run Polly before such optimizations
are performed. Tough, in your case it may still be possible to
parallelize the loop. To do this, you would need to ignore all
dependences that can be removed by creating thread private copies of
'tmp'. If you are interested in having this implemented either open a
bug report or give it a try yourself. I am happy to assist.
Cheers
Tobi
More information about the llvm-dev
mailing list