[LLVMdev] Proposal: Generic auto-vectorization and parallelization approach for LLVM and Polly
Tobias Grosser
grosser at fim.uni-passau.de
Sat Jan 8 10:27:41 PST 2011
On 01/06/2011 10:59 AM, Renato Golin wrote:
> On 6 January 2011 15:16, Tobias Grosser<grosser at fim.uni-passau.de> wrote:
>>> The main idea is, we separate the transform passes and codegen passes
>>> for auto-parallelization and vectorization (Graphite[2] for gcc seems
>>> to taking similar approach for auto-vectorization).
>
> I agree with Ether.
>
> A two-stage vectorization would allow you to use the simple
> loop-unroller already in place to generate vector/mp intrinsics from
> them, and if more parallelism is required, use the expensive Poly
> framework to skew loops and remove dependencies, so the loop-unroller
> and other cheap bits can do their job where then couldn't before.
>
> So, in essence, this is a three-stage job. The optional heavy-duty
> Poly analysis, the cheap loop-optimizer and the mp/vector
> transformation pass. The best features of having them three is to be
> able to choose the level of vectorization you want and to re-use the
> current loop analysis into the scheme.
OK. First of all to agree on a name, we decided to call the Polyhedral
analysis we develop PoLLy, as in Polly the parrot. ;-) Maybe it was a
misleading choice?
In general as I explained I agree that a three stage approach is useful,
for the reasons you explained, however it is more overhead (and just
implementation work) than the one we use now. I currently do not have
the time to implement the proposed approach. In case anybody is
interested to work on patches, I am happy to support this.
>> What other types of parallelism are you expecting? We currently support
>> thread level parallelism (as in OpenMP) and vector level parallelism (as
>> in LLVM-IR vectors). At least for X86 I do not see any reason for
>> target specific auto-vectorization as LLVM-IR vectors are lowered
>> extremely well to x86 SIMD instructions. I suppose this is the same for
>> all CPU targets. I still need to look into GPU targets.
>
> I'd suggest to try and transform sequential instructions into vector
> instructions (in the third stage) if proven to be correct.
>
> So, when Poly skews a loop, and the loop analysis unrolls it to, say,
> 4 calls to the same instruction, a metadata binding them together can
> hint the third stage to make that a vector operation with the same
> semantics.
I know, this is the classical approach for vector code generation. The
difference in Polly is that we do not have a loop represented in
LLVM-IR, which we would like to vectorize, but we have a loop body
and its content which we want to create as vector code. So instead of
creating the LLVM-IR loop structure, write meta data, unroll the loop
and than create merge instructions to vector instructions, the only
change in polly is, that it either generates N scalar instructions per
original instruction or one vector instruction (if N is the number of
loop iterations which is equivalent to the vector width). So
vectorization in Polly was very easy to implement and already works
reasonable well.
>> LLVM-IR vector instructions however are generic SIMD
>> instructions so I do not see any reason to create target specific
>> auto vectorizer passes.
>
> If you're assuming the original code is using intrinsics, that is
> correct. But if you want to generate the vector code from Poly, than
> you need to add that support, too.
Why are target specific vectorization passes needed to generate vector
instructions from Polly? The only target specific information I
currently see is the vector width, which a generic vectorization pass
can obtain from the target data information. Could you explain for which
features target specific vectorization would be needed?
> ARM also has good vector instruction selection (on Cortex-A* with
> NEON), so you also get that for free. ;)
I have read this and the look interesting. I suppose they are created
out of the box, if a pass generates LLVM-IR vector instructions?
> cheers,
> --renato
Thanks for your comments
Tobi
More information about the llvm-dev
mailing list