[LLVMdev] Vectorization: Next Steps

Mon Feb 6 14:26:24 PST 2012

On Feb 2, 2012, at 7:56 PM, Hal Finkel wrote:
> As some of you may know, I committed my basic-block autovectorization
> pass a few days ago. I encourage anyone interested to try it out (pass
> -vectorize to opt or -mllvm -vectorize to clang) and provide feedback.
> Especially in combination with -unroll-allow-partial, I have observed
> some significant benchmark speedups, but, I have also observed some
> significant slowdowns. I would like to share my thoughts, and hopefully
> get feedback, on next steps.

Hi Hal,

I haven't had a chance to look at your pass in detail, but here are some opinions: :)

> 1. "Target Data" for vectorization - I think that in order to improve
> the vectorization quality, the vectorizer will need more information
> about the target. This information could be provided in the form of a
> kind of extended target data. This extended target data might contain:
> - What basic types can be vectorized, and how many of them will fit
> into (the largest) vector registers
> - What classes of operations can be vectorized (division, conversions /
> sign extension, etc. are not always supported)
> - What alignment is necessary for loads and stores
> - Is scalar-to-vector free?

I think that this will be a really important API, but I strongly advocate that you model this after TargetLoweringInfo instead of TargetData.  First, TargetData isn't actually a target API (it should be fixed, I filed PR11936 to track this).  Second, targets will have to implement imperative code to return precise answers to questions.  For example, you'll want something like "what is the cost of a shuffle with this mask" which will be extremely target specific, will depend on what CPU subfeatures are enabled, etc.

When you start working on this, I strongly encourage you to propose the API you want here.  Start small and add features as you go.

> 2. Feedback between passes - We may to implement a closer coupling
> between optimization passes than currently exists. Specifically, I have
> in mind two things:
> - The vectorizer should communicate more closely with the loop
> unroller. First, the loop unroller should try to unroll to preserve
> maximal load/store alignments. Second, I think it would make a lot of
> sense to be able to unroll and, only if this helps vectorization should
> the unrolled version be kept in preference to the original. With basic
> block vectorization, it is often necessary to (partially) unroll in
> order to vectorize. Even when we also have real loop vectorization,
> however, I still think that it will be important for the loop unroller
> to communicate with the vectorizer.

I really disagree with this, see below.

> 3. Loop vectorization - It would be nice to have, in addition to
> basic-block vectorization, a more-traditional loop vectorization pass. I
> think that we'll need a better loop analysis pass in order for this to
> happen. Some of this was started in LoopDependenceAnalysis, but that
> pass is not yet finished. We'll need something like this to recognize
> affine memory references, etc.

I think that a loop vectorizor and a basic block vectorizer both make perfect sense and are important for different classes of code.  However, I don't think that we should go down the path of trying to use a "basic block vectorizor + loop unrolling" serve the purpose of a loop vectorizer.  Trying to make a BBVectorizer and a loop unroller play together will be really fragile, because they'll both have to duplicate the same metrics (otherwise, for example, you'd unroll a loop that isn't vectorizable).  This will also be a huge hit to compile time.

-Chris