[LLVMdev] Loop vectorizer

Wed Oct 17 00:13:08 PDT 2012

Hi everybody,

On 10/17/12 12:32 AM, Hal Finkel wrote:
>>> Do you have a plan for xforms to increase the amount of
>>> vectorization?
>>
>> Yes. We will need to implement a predication phase and to design the
>> interaction with other loop transformations. Also, this will have to
>> work well with the cost model. We also need to think of a good way to
>> detect early on if the transformations are likely to be effective,
>> because we currently don't have a good way of undoing compiler
>> transformations.
>>
>> I think that a simple if-converter will be a good place to start. What
>> do you think ?
>
> Quick comment: IIRC, Ralf Karrenberg has already implemented this (as part of his WVF project: https://github.com/karrenberg/wfv/tree/llvm_30). It might be worthwhile to work on cleaning up his implementation instead of starting from scratch.
>
>   -Hal

WFV [1] does indeed include phases that correspond to full control-flow 
to data-flow conversion (not just if-conversion, it can flatten all 
kinds of control flow including nested loops with multiple exits etc.).

I am currently working on a full re-implementation of the WFV algorithm 
on top of the latest trunk.
One part of it that is basically finished is an analysis pass that I 
call "vectorization analysis", which annotates a function (WFV works on 
entire functions) with metadata used during control-flow to data-flow 
conversion and instruction vectorization.
To give you a broad idea, this includes information like:
- uniform/varying operation
- same/consecutive/random index vector (for load/store)
- aligned/unaligned index vector (for load/store)
- operations that can not be vectorized (marked as "split", e.g. 
non-vectorizable types etc.)
- operations that need to be split and guarded (e.g. unknown calls, stores)
- mandatory/optional blocks (renamed from "divergent"/"non-divergent" in 
[2])
- divergent/non-divergent loops

Generally, it would be possible to implement a loop vectorizer on top of 
WFV simply by running a loop dependency analysis to determine if the 
loop in question is vectorizable, extracting the loop body into a 
function, running WFV on it, and inlining the call again.

I am willing to provide all of my implementation as soon as required.
I hope to have mostly finished the rewrite at that point.

Cheers,
Ralf

[1] "Whole-Function Vectorization", Karrenberg and Hack, CGO'11
[2] "Improving Performance of OpenCL on CPUs", Karrenberg and Hack, CC'12