[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization

Thu Jan 24 09:47:31 PST 2013

Hi,

I started to play with the LoopVectorizer of LLVM trunk
on the work-item loops produced by pocl's OpenCL C
kernel compiler, in hopes of implementing multi-work-item
work group autovectorization in a modular manner.

The vectorizer seems to refuse to vectorize the loop if it sees
multiple writes to the same memory object within the
same iteration. In case of parallel loops such as
the work-item loops, it could just assume vectorization is doable
from the data dependency point of view -- no matter what kind of
memory accesses the single iteration does.

What would be the cleanest way to communicate the parallel loop
information to the vectorizer? There was some discussion of
parallelism information in LLVM some time ago in this list, but
it somehow died. Was adding some parallelism information to
the LLVM IR decided to be a bad idea? Any conclusion in that?

Another thing with OpenCL C autovectorization is that the
language itself has vector datatypes. In order to autovectorize
multi-WI work groups efficiently, it might be beneficial to
break the vectors in the single work item to scalars to get more
efficient vector hardware utilization. Is there an existing pass
that breaks vectors to scalars and that works on the LLVM IR level?
There seems to be such at the code gen level according to
this blog post: http://blog.llvm.org/2011/12/llvm-31-vector-changes.html

Thanks,
-- 
Pekka