[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization

Fri Jan 25 13:54:25 PST 2013

> I am in favor of adding metadata to control different aspects of
> vectorization, mainly for supporting user-level pargmas [1] but also for
> DSLs. Before we start adding metadata to the IR we need to define the
> semantics of the tags. "Parallel_for" is too general. We also want to control
> vectorization factor, unroll factor, cost model, etc.

These are used to control *how* the loops are parallelized.
The generic "parallel_for" lets the compiler (to try) to do the actual
parallelization decisions based on the target (aim for performance
portability). So, both have their uses.

> Doug Gregor suggested to add the metadata to the branch instruction of the
> latch block in the loop.

OK that should work better. I'll look into it next week.

> My main concern is that your approach for vectorizing OpenCL is wrong. OpenCL
> was designed for SPMD/outer-loop vectorization and any good OpenCL vectorizer
> should be able to vectorize 100% of the workloads.  The Loop Vectorizer
> vectorizes innermost loops only. It has a completely different cost model and
> legality checks. You also have no use for reduction variables, reverse
> iterators, etc. If all you are interested in is the widening of instructions
> then you can easily implement it.

Sorry, I still don't see the problem in the "modular" approach vs. generating
vector instructions directly in pocl -- but then again, I'm not a vectorization
expert. All I'm really trying to do is to delegate the "widening of
instructions" and the related tasks to the loop vectorizer. If it doesn't
need all of the vectorizer's features it should not be a problem AFAIU. I think
it's better for me just play a bit with it, and experience the possible problems
in it.

-- 
--Pekka