[LLVMdev] GSoC 2009: Auto-vectorization

Nick Lewycky nicholas at mxc.ca
Tue Mar 31 22:05:36 PDT 2009


Andreas Bolka wrote:
> Hi all,
> 
> I'd like to have a first stab at a loop-based auto-vectorization pass as
> part of 2009's Google Summer of Code program. As far as I can tell from
> searching the mailing list archives, no work on such an auto-vectorizer
> seems to be currently in progress.

Hi Andreas,

Actually, you'd be the third person to try writing one, with myself 
being the second. :) Don't let that discourage you, I don't think 
anyone's actively working on it now.

> Whereas auto-vectorization is a well-researched topic, complexity seems
> to quickly explode once more advanced loop constructs are to be
> supported. Therefore my aim is to start with the most minimal
> implementation possible, to explore the difficulties encountered in
> the specific context of LLVM and to build a foundation from which future
> work can progress.

There's two types of autovectorization, SLP (superword level 
parallelism) and ILP (instruction level parallelism). You can do ILP 
which is loop-ignorant autovectorization of straight-line code which 
turns out to be the code that runs inside the loop.

> So, initially, I aim at supporting only the simplest loops such as:
> 
>     int a[256], b[256], c[256];
>     for (int i = 0; i < 256; ++i)
>       c[i] = a[i] + b[i];
> 
> My goal is to implement the necessary analyses and transformations to
> turn IR corresponding to such code into IR utilizing vector
> instructions; i.e. the core of the desired result would look like:
> 
>     %va = load <256 x i32>* %a
>     %vb = load <256 x i32>* %b
>     %vc = add <256 x i32> %a, %b
>     store <256 x i32> %vc, <256 x i32>* %c
> 
> After this transformation, the code could be left as is (which would
> generate a fully unrolled vector-add, as long as the vector length
> doesn't exceed the lowering capabilities of the code generator), or
> strip-mined into a loop using fixed-size vectors (preferably
> corresponding to the SIMD register width of the target architecture).
> 
> Auto-vectorization has numerous dependencies on other
> analysis/transformation passes (loop normalization, constant/scalar
> expansion, dependence analysis, etc). A significant part of the proposed
> project is to evaluate already-available LLVM passes regarding their
> applicability to support auto-vectorization and to extend/implement
> what's missing.

There's no loop dependence analysis in LLVM yet. How are you planning to 
implement it? Can you point to a paper, or write up a plain English 
description?

> As for my personal background, I'm a graduate student at Vienna
> University of Technology (TU Wien) with a strong background in compiler
> theory, acquired in a wide variety of undergraduate- and graduate-level
> courses. I recently implemented two simple compilers LLVM and conducted
> various experiments using LLVM's vector support. My general interest in
> data-parallel computation stems from a great fondness of APL (especially
> of its modern offspring K). While I'm familiar with the concepts behind
> auto-vectorization, I haven't implemented an auto-vectorizer before. I'm
> author of and contributor to several smaller open source projects.
> 
> I appreciate any suggestions and would be very excited if someone is
> interested in mentoring this.

I'm interested, but I'd like to see more details about exactly how (ie., 
with what algorithm) you intend to use to find and transform code first.

Nick




More information about the llvm-dev mailing list