[LLVMdev] GSoC 2009: Auto-vectorization

Wed Apr 1 14:54:47 PDT 2009

Hi Chris,

On Wed Apr 01 08:18:28 +0200 2009, Chris Lattner wrote:
> On Mar 31, 2009, at 5:27 PM, Andreas Bolka wrote:
> > My goal is to implement the necessary analyses and transformations to
> > turn IR corresponding to such code into IR utilizing vector
> > instructions;
> 
> Sounds great.  Some important steps:

> 1. We need an abstract dependence analysis interface (like we have for
> AA) and one implementation (probably handling trivial loops like the
> one above to start with).  Many people have talked about this, but no
> code has gone in yet.

My original plan was to implement a loop dependence analysis only for
non-nested loops using the "GCD" method of determining interdependent
statements. This limits the array subscript expressions to simple linear
equations of the form x1*i + x0 where i is the induction variable of the
loop.

I haven't thought yet about generalising this to ease future
implementation of more complex loop dependency analyses. 

> 2. We need some interface that can be implemented by a target machine
> to describe what the vector capabilities of the machine are.

Initially the vectorization pass would basically only need the SIMD
register width; later, the supported "packing formats" may come in handy
(i.e. the information that 128-bit registers that be used as e.g.
16 x i8, 8 x i16, and 4 x i32).

> 3. We need the transformation code.  Starting with a simple loop
> vectorizor (as opposed to an SLP system) would make sense to me.

I'm also inclined to prefer "classical" Allen & Kennedy-style
auto-vectorization over "Vectorization by unrolling"/SLP approaches.
Based on the responses on this list, it seems that this sentiment is
shared by the LLVM community. An SLP pass would certainly be a nice
complement to loop-based auto-vectorization, though

> Once the basics are in place, it can be feature-crept to support new
> idioms (reductions, alignment analysis, etc).  The first cut doesn't
> need to provide an immediate speedup, but should have many testcases
> that demonstrates the loop forms it can handle.

That's exactly my understanding of it. I want to limit this to the most
basic transformation possible; i.e.:

- non-nested loops, having
- a statically known constant trip count, which is
- divisible by the vectorization factor;
- a single basic block as loop body, with
- a single arithmetic instruction, using
- only same-sized vector operands, and
- ignoring alignment issues.

> Vectorizing to extremely wide vectors like this isn't along the lines
> of what we want: we should get something like the original loop
> unrolled 4 times to do operations on <4 x i32>'s (for x86/ppc).

Yes, I agree. See also my reply to Stefanus.

> > Auto-vectorization has numerous dependencies on other
> > analysis/transformation passes (loop normalization, constant/scalar
> > expansion, dependence analysis, etc). A significant part of the
> > proposed project is to evaluate already-available LLVM passes
> > regarding their applicability to support auto-vectorization and to
> > extend/implement what's missing.
> 
> Depending on how ambitious you are, you could also do a GSoC project
> just around a) building dependence analysis interface + some
> *non*-trivial implementations, and b) build some dep analysis clients
> around it (e.g. loop interchange, etc). This may be a less risky and
> higher impact project in the short term, if you are interested in it.

I understand the importance and necessity of a general loop dependence
interface, but I'm not too familiar with the more complex
approaches (which generally require integer linear programming, if I'm
not mistaken). So I can't really asses if there's a realistic chance to
build a generally usable non-trivial implementation in the time
available.

But as a simple loop dependence analysis pass will be one of the first
steps in this project anyway, I'll be in a much better position to
re-evaluate that question once this first pass is done. I'd certainly be
willing to modify the project, if it turns out that this direction would
be more valuable and/or more realistic to pursue.

> Great!  Either approach (building a very simple auto-vec system from
> end-to-end, or building some non-trivial dep analysis implementations
> + a client or two) sound like very promising GSoC projects!

Thanks for the extensive feedback!

-- 
Andreas