[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization

Mon Jan 28 03:53:26 PST 2013

Hi Nick,

On 01/28/2013 12:17 PM, Nick Lewycky wrote:
> Aren't all loops in OpenCL parallel? Or are you planning to inline

The intra-kernel loops (what the OpenCL C programmer writes) are not by
default parallel. Only the implicit "work group loops" (that iterate
over the work items in the local work space for the regions between
barriers) are.

> non-OpenCL code into your OpenCL code before running the vectorizer? If
> not, just have the vectorizer run as part of the pipeline you set up
> when producing IR from OpenCL code. That it would miscompile non-OpenCL
> code is irrelevant.

I (still) think a cleaner and a more modularized approach is to simply add
parallel loop-awareness to the regular vectorizer. This should help
other parallel languages with parallel loop constructs, too.

The basic idea is to use a loop interchange-style optimization to convert
the work group function to a generic inner loop vectorization problem.
Effectively doing outer-loop vectorization this way like Nadav Rotem
suggested. Let's see how it goes.

> + for (BasicBlock::iterator ii = header->begin();
> + ii != header->end(); ii++) {
>
> http://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop

Thanks. I'll send an updated patch shortly in a separate
email thread.

BR,
-- 
Pekka