[LLVMdev] Loop vectorizer
Ralf Karrenberg
Chareos at gmx.de
Wed Oct 17 00:13:08 PDT 2012
Hi everybody,
On 10/17/12 12:32 AM, Hal Finkel wrote:
>>> Do you have a plan for xforms to increase the amount of
>>> vectorization?
>>
>> Yes. We will need to implement a predication phase and to design the
>> interaction with other loop transformations. Also, this will have to
>> work well with the cost model. We also need to think of a good way to
>> detect early on if the transformations are likely to be effective,
>> because we currently don't have a good way of undoing compiler
>> transformations.
>>
>> I think that a simple if-converter will be a good place to start. What
>> do you think ?
>
> Quick comment: IIRC, Ralf Karrenberg has already implemented this (as part of his WVF project: https://github.com/karrenberg/wfv/tree/llvm_30). It might be worthwhile to work on cleaning up his implementation instead of starting from scratch.
>
> -Hal
WFV [1] does indeed include phases that correspond to full control-flow
to data-flow conversion (not just if-conversion, it can flatten all
kinds of control flow including nested loops with multiple exits etc.).
I am currently working on a full re-implementation of the WFV algorithm
on top of the latest trunk.
One part of it that is basically finished is an analysis pass that I
call "vectorization analysis", which annotates a function (WFV works on
entire functions) with metadata used during control-flow to data-flow
conversion and instruction vectorization.
To give you a broad idea, this includes information like:
- uniform/varying operation
- same/consecutive/random index vector (for load/store)
- aligned/unaligned index vector (for load/store)
- operations that can not be vectorized (marked as "split", e.g.
non-vectorizable types etc.)
- operations that need to be split and guarded (e.g. unknown calls, stores)
- mandatory/optional blocks (renamed from "divergent"/"non-divergent" in
[2])
- divergent/non-divergent loops
Generally, it would be possible to implement a loop vectorizer on top of
WFV simply by running a loop dependency analysis to determine if the
loop in question is vectorizable, extracting the loop body into a
function, running WFV on it, and inlining the call again.
I am willing to provide all of my implementation as soon as required.
I hope to have mostly finished the rewrite at that point.
Cheers,
Ralf
[1] "Whole-Function Vectorization", Karrenberg and Hack, CGO'11
[2] "Improving Performance of OpenCL on CPUs", Karrenberg and Hack, CC'12
More information about the llvm-dev
mailing list