[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization

Thu Jan 24 23:56:19 PST 2013

Hi Pekka, 

> Hi,
> 
> I started to play with the LoopVectorizer of LLVM trunk
> on the work-item loops produced by pocl's OpenCL C
> kernel compiler, in hopes of implementing multi-work-item
> work group autovectorization in a modular manner.
> 

Thanks for checking the Loop Vectorizer, I am interested in hearing your feedback. The Loop Vectorizer does not fit here. OpenCL vectorization is completely different because the language itself is data-parallel. You don't need all of the legality checks that the loop vectorizer has. Moreover, OpenCL has lots of language specific APIs such as "get_global_id" and builtin function calls, and without knowledge of these calls it is impossible to vectorize OpenCL.

> The vectorizer seems to refuse to vectorize the loop if it sees
> multiple writes to the same memory object within the
> same iteration. In case of parallel loops such as
> the work-item loops, it could just assume vectorization is doable
> from the data dependency point of view -- no matter what kind of
> memory accesses the single iteration does.
> 

Yep. 

> What would be the cleanest way to communicate the parallel loop
> information to the vectorizer? There was some discussion of
> parallelism information in LLVM some time ago in this list, but
> it somehow died. Was adding some parallelism information to
> the LLVM IR decided to be a bad idea? Any conclusion in that?
> 

You need to implement something like Whole Function Vectorization (http://dl.acm.org/citation.cfm?id=2190061). The loop vectorizer can't help you here. Ralf Karrenberg open sourced his implementation on github. You should take a look. 

> Another thing with OpenCL C autovectorization is that the
> language itself has vector datatypes.

Unfortunately yes. And OpenCL compilers scalarize these vector operations at some point in the compilation pipeline.  

> In order to autovectorize
> multi-WI work groups efficiently, it might be beneficial to
> break the vectors in the single work item to scalars to get more
> efficient vector hardware utilization. Is there an existing pass
> that breaks vectors to scalars and that works on the LLVM IR level?

No. But this pass needs to be OpenCL specific because you want to scalarize function calls. OpenCL is "blessed" with lots of function calls, even for trivial type conversions.

> There seems to be such at the code gen level according to
> this blog post: http://blog.llvm.org/2011/12/llvm-31-vector-changes.html

Yes but you can't use it because you need to do this at IR-level.

- Nadav

> 
> Thanks,
> -- 
> Pekka
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev