[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization

Mon Jan 28 02:17:07 PST 2013

Pekka Jääskeläinen wrote:
> On 01/25/2013 04:21 PM, Hal Finkel wrote:
>> My point is that I specifically think that you should try it. I'm curious
>> to see how what you come up with might apply to other use cases as well.
>
> OK, attached is the first quick attempt towards this. I'm not
> proposing committing this, but would like to get comments
> to possibly move towards something committable.
>
> It simply looks for a metadata named 'parallel_for' in any of the
> instructions in the loop's header and assumes the loop is a parallel
> one if such is found.

Aren't all loops in OpenCL parallel? Or are you planning to inline 
non-OpenCL code into your OpenCL code before running the vectorizer? If 
not, just have the vectorizer run as part of the pipeline you set up 
when producing IR from OpenCL code. That it would miscompile non-OpenCL 
code is irrelevant.

+  for (BasicBlock::iterator ii = header->begin();
+       ii != header->end(); ii++) {

http://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop

Nick

  This metadata is added by the pocl's wiloops
> generation routine. It passes the pocl test suite when enabled but
> probably cannot vectorize many kernels (at least) due to the missing
> intra-kernel vector scalarizer.
>
> Some known problems that need addressing:
>
> - Metadata can only be attached to Instructions (not Loops or even
> BasicBlocks), therefore the brute force approach of marking all
> instructions in the header BB in hopes of that optimizers
> might retain at least one of them. E.g., a special intrinsics call
> might be a better solution.
>
> - The loop header can be potentially shared with multilevel loops where the
> outer or inner levels might not be parallel. Not a problem in the pocl use
> case as the wiloops are fully parallel at all the three levels, but needs
> to be sorted out in a general solution.
>
> Perhaps it would be better to attach the metadata to the iteration
> count increment/check instruction(s) or similar to better identify the
> parallel (for) loop in question.
>
> - Are there optimizations that might push code *illegally* to the parallel
> loop from the outside of it? If there's, e.g., a non-parallel loop inside
> a parallel loop, loop invariant code motion might move code from the
> inner loop to the parallel loop's body. That should be a safe optimization,
> to my understanding (it preservers the ordering semantics), but I wonder if
> there are others that might cause breakage.
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev