[LLVMdev] Proposal: Generic auto-vectorization and parallelization approach for LLVM and Polly

Thu Jan 6 00:38:39 PST 2011

Hi,

I just have a detail look at the code of Polly[1], it seems that Polly
start to support some basic auto-parallelization stuffs. I have some
idea to improve the current auto-vectorization and parallelization
approach in Polly.

The main idea is, we separate the transform passes and codegen passes
for auto-parallelization and vectorization (Graphite[2] for gcc seems
to taking similar approach for auto-vectorization).

That means Polly (Or other similar framework) should perform necessary
code transform, then just generates the sequential code, and provides
necessary parallelism information (These information could be encoded
as metadata just like TBAA), then the later passes can generate the
parallel code with those parallelism information.

The main benefit of separating transform passes and codegen passes are:
1. Decouple the the autopar framework from various parallel runtime
environment, so we can keep both autopar framework and code generation
pass for specific parallel runtime environment simple. And we can
support more parallel runtime environments easily.

2. Allow the some generic parallelism information live out specific
autopar framework, so these information can benefit more passes in
llvm. For example, the X86 and PTX backend could use these information
to perform target specific auto-vectorization.

Implementation consideration:

We may define some kind of generic "parallelism analysis" or the
generic version of "LoopDependenceAnalysis" interface just like
AliasAnalysis, or we can also encode those parallelism information as
metadata. And combining the both should be OK, too.

Any comments are appreciated.

best regards
ether

[1] http://wiki.llvm.org/Polly
[2] http://gcc.gnu.org/wiki/Graphite/Parallelization