[LLVMdev] [Polly] Parallelizing outer loop containing inner reduction loop

Dmitry Mikushin dmitry at kernelgen.org
Mon Feb 4 07:55:56 PST 2013


Hi Tobi,

Thanks for looking into this!

2013/2/4 Tobias Grosser <tobias at grosser.es>
>
> In any case, you seemed to have in some way convinced Polly to accept this
> code. Would you mind to share what you did?
>

Sure. Aliasing is simply ignored. Instead we have substituted pointers and
sizes for arrays and a special pass that converts memory accesses from
every scop statement into ISL general form. Sorry, we are quite far from
standard polly invocation process, maybe I should prepare some simplified
plugin for testing purposes...


>
>
> Regarding your problem. As far as I understand, the problem is that the
> following code:
>
> for (i
>   A[i] = 0
>   for (j
>       A[i] +=
>   ... = A[i]
>
> is changed by gcc (and other optimizations) to:
>
> for (i
>   A[i] = 0
>   tmp = A[i]
>
>   for (j
>       tmp +=
>
>   A[i] = tmp
>   ... = A[i]
>

Yes, exactly!


>
> This is a common optimization that unfortunately introduces a lot of
> dependences on tmp that block any direct parallelization. To parallelize
> the loop anyway we would need to expand the memory of tmp, such that each
> parallel thread has a private copy of 'tmp'. Deciding where and how to
> expand memory to enable further transformations is in general difficult
> such that I would normally run Polly before such optimizations are
> performed. Tough, in your case it may still be possible to parallelize the
> loop. To do this, you would need to ignore all dependences that can be
> removed by creating thread private copies of 'tmp'. If you are interested
> in having this implemented either open a bug report or give it a try
> yourself. I am happy to assist.
>

Hm, how to create thread-private copies of tmp at that point and how useful
could it be? The problem is that platform-dependent view of threads only
steps into the process, once the loop is proved to be parallel. Before
that, as far as I know, Polly/CLooG/ISL can't be aware of such things. I
thought more about pushing AllocaInst-s down closer to the nested array
header - would that work?

Thanks,
- D.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130204/0f706507/attachment.html>


More information about the llvm-dev mailing list