[LLVMdev] [Polly] Parallelizing outer loop containing inner reduction loop
Dmitry Mikushin
dmitry at kernelgen.org
Mon Feb 4 07:55:56 PST 2013
Hi Tobi,
Thanks for looking into this!
2013/2/4 Tobias Grosser <tobias at grosser.es>
>
> In any case, you seemed to have in some way convinced Polly to accept this
> code. Would you mind to share what you did?
>
Sure. Aliasing is simply ignored. Instead we have substituted pointers and
sizes for arrays and a special pass that converts memory accesses from
every scop statement into ISL general form. Sorry, we are quite far from
standard polly invocation process, maybe I should prepare some simplified
plugin for testing purposes...
>
>
> Regarding your problem. As far as I understand, the problem is that the
> following code:
>
> for (i
> A[i] = 0
> for (j
> A[i] +=
> ... = A[i]
>
> is changed by gcc (and other optimizations) to:
>
> for (i
> A[i] = 0
> tmp = A[i]
>
> for (j
> tmp +=
>
> A[i] = tmp
> ... = A[i]
>
Yes, exactly!
>
> This is a common optimization that unfortunately introduces a lot of
> dependences on tmp that block any direct parallelization. To parallelize
> the loop anyway we would need to expand the memory of tmp, such that each
> parallel thread has a private copy of 'tmp'. Deciding where and how to
> expand memory to enable further transformations is in general difficult
> such that I would normally run Polly before such optimizations are
> performed. Tough, in your case it may still be possible to parallelize the
> loop. To do this, you would need to ignore all dependences that can be
> removed by creating thread private copies of 'tmp'. If you are interested
> in having this implemented either open a bug report or give it a try
> yourself. I am happy to assist.
>
Hm, how to create thread-private copies of tmp at that point and how useful
could it be? The problem is that platform-dependent view of threads only
steps into the process, once the loop is proved to be parallel. Before
that, as far as I know, Polly/CLooG/ISL can't be aware of such things. I
thought more about pushing AllocaInst-s down closer to the nested array
header - would that work?
Thanks,
- D.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130204/0f706507/attachment.html>
More information about the llvm-dev
mailing list