[LLVMdev] GSoC 2009: Auto-vectorization

Tue Mar 31 23:18:28 PDT 2009

On Mar 31, 2009, at 5:27 PM, Andreas Bolka wrote:
> Hi all,
> I'd like to have a first stab at a loop-based auto-vectorization  
> pass as
> part of 2009's Google Summer of Code program. As far as I can tell  
> from
> searching the mailing list archives, no work on such an auto- 
> vectorizer
> seems to be currently in progress.

Hi Andreas,

This would be a very interesting project!  The most important part of  
the proposal is to break it down into small pieces.

> Whereas auto-vectorization is a well-researched topic, complexity  
> seems
> to quickly explode once more advanced loop constructs are to be
> supported. Therefore my aim is to start with the most minimal
> implementation possible, to explore the difficulties encountered in
> the specific context of LLVM and to build a foundation from which  
> future
> work can progress.

This is a great approach.  I'm very much in favor of solving small  
problems first, instead of trying to solve all the worlds issues at  
once. :)

> So, initially, I aim at supporting only the simplest loops such as:
>
>    int a[256], b[256], c[256];
>    for (int i = 0; i < 256; ++i)
>      c[i] = a[i] + b[i];
>
> My goal is to implement the necessary analyses and transformations to
> turn IR corresponding to such code into IR utilizing vector
> instructions;

Sounds great.  Some important steps:

1. We need an abstract dependence analysis interface (like we have for  
AA) and one implementation (probably handling trivial loops like the  
one above to start with).  Many people have talked about this, but no  
code has gone in yet.

2. We need some interface that can be implemented by a target machine  
to describe what the vector capabilities of the machine are.

3. We need the transformation code.  Starting with a simple loop  
vectorizor (as opposed to an SLP system) would make sense to me.

Once the basics are in place, it can be feature-crept to support new  
idioms (reductions, alignment analysis, etc).  The first cut doesn't  
need to provide an immediate speedup, but should have many testcases  
that demonstrates the loop forms it can handle.

> i.e. the core of the desired result would look like:
>
>    %va = load <256 x i32>* %a
>    %vb = load <256 x i32>* %b
>    %vc = add <256 x i32> %a, %b
>    store <256 x i32> %vc, <256 x i32>* %c

Actually, IMO this is probably not the directly we want to go.   
Vectorizing to extremely wide vectors like this isn't along the lines  
of what we want: we should get something like the original loop  
unrolled 4 times to do operations on <4 x i32>'s (for x86/ppc).

> Auto-vectorization has numerous dependencies on other
> analysis/transformation passes (loop normalization, constant/scalar
> expansion, dependence analysis, etc). A significant part of the  
> proposed
> project is to evaluate already-available LLVM passes regarding their
> applicability to support auto-vectorization and to extend/implement
> what's missing.

Sounds good.  I'm also fine with it only working in carefully  
controlled circumstances.  Depending on how ambitious you are, you  
could also do a GSoC project just around a) building dependence  
analysis interface + some *non*-trivial implementations, and b) build  
some dep analysis clients around it (e.g. loop interchange, etc).   
This may be a less risky and higher impact project in the short term,  
if you are interested in it.

> As for my personal background, I'm a graduate student at Vienna
> University of Technology (TU Wien) with a strong background in  
> compiler
> theory, acquired in a wide variety of undergraduate- and graduate- 
> level
> courses. I recently implemented two simple compilers LLVM and  
> conducted
> various experiments using LLVM's vector support. My general interest  
> in
> data-parallel computation stems from a great fondness of APL  
> (especially
> of its modern offspring K). While I'm familiar with the concepts  
> behind
> auto-vectorization, I haven't implemented an auto-vectorizer before.  
> I'm
> author of and contributor to several smaller open source projects.

Great!  Either approach (building a very simple auto-vec system from  
end-to-end, or building some non-trivial dep analysis implementations  
+ a client or two) sound like very promising GSoC projects!

-Chris