[LLVMdev] Proposal: Generic auto-vectorization and parallelization approach for LLVM and Polly

Sat Jan 8 10:27:41 PST 2011

On 01/06/2011 10:59 AM, Renato Golin wrote:
> On 6 January 2011 15:16, Tobias Grosser<grosser at fim.uni-passau.de>  wrote:
>>> The main idea is, we separate the transform passes and codegen passes
>>> for auto-parallelization and vectorization (Graphite[2] for gcc seems
>>> to taking similar approach for auto-vectorization).
>
> I agree with Ether.
>
> A two-stage vectorization would allow you to use the simple
> loop-unroller already in place to generate vector/mp intrinsics from
> them, and if more parallelism is required, use the expensive Poly
> framework to skew loops and remove dependencies, so the loop-unroller
> and other cheap bits can do their job where then couldn't before.
>
> So, in essence, this is a three-stage job. The optional heavy-duty
> Poly analysis, the cheap loop-optimizer and the mp/vector
> transformation pass. The best features of having them three is to be
> able to choose the level of vectorization you want and to re-use the
> current loop analysis into the scheme.

OK. First of all to agree on a name, we decided to call the Polyhedral 
analysis we develop PoLLy, as in Polly the parrot. ;-) Maybe it was a 
misleading choice?

In general as I explained I agree that a three stage approach is useful,
for the reasons you explained, however it is more overhead (and just 
implementation work) than the one we use now. I currently do not have 
the time to implement the proposed approach. In case anybody is 
interested to work on patches, I am happy to support this.

>> What other types of parallelism are you expecting? We currently support
>> thread level parallelism (as in OpenMP) and vector level parallelism (as
>> in LLVM-IR vectors). At least for X86 I do not see any reason for
>> target specific auto-vectorization as LLVM-IR vectors are lowered
>> extremely well to x86 SIMD instructions. I suppose this is the same for
>> all CPU targets. I still need to look into GPU targets.
>
> I'd suggest to try and transform sequential instructions into vector
> instructions (in the third stage) if proven to be correct.
>
> So, when Poly skews a loop, and the loop analysis unrolls it to, say,
> 4 calls to the same instruction, a metadata binding them together can
> hint the third stage to make that a vector operation with the same
> semantics.

I know, this is the classical approach for vector code generation. The 
difference in Polly is that we do not have a loop represented in 
LLVM-IR, which we would like to vectorize, but we have a loop body
and its content which we want to create as vector code. So instead of
creating the LLVM-IR loop structure, write meta data, unroll the loop
and than create merge instructions to vector instructions, the only 
change in polly is, that it either generates N scalar instructions per 
original instruction or one vector instruction (if N is the number of 
loop iterations which is equivalent to the vector width). So 
vectorization in Polly was very easy to implement and already works 
reasonable well.

>> LLVM-IR vector instructions however are generic SIMD
>> instructions so I do not see any reason to create target specific
>> auto vectorizer passes.
>
> If you're assuming the original code is using intrinsics, that is
> correct. But if you want to generate the vector code from Poly, than
> you need to add that support, too.

Why are target specific vectorization passes needed to generate vector 
instructions from Polly? The only target specific information I 
currently see is the vector width, which a generic vectorization pass 
can obtain from the target data information. Could you explain for which 
features target specific vectorization would be needed?

> ARM also has good vector instruction selection (on Cortex-A* with
> NEON), so you also get that for free. ;)
I have read this and the look interesting. I suppose they are created 
out of the box, if a pass generates LLVM-IR vector instructions?

> cheers,
> --renato

Thanks for your comments

Tobi