[LLVMdev] Vectorization: Next Steps
clattner at apple.com
Mon Feb 6 14:26:24 PST 2012
On Feb 2, 2012, at 7:56 PM, Hal Finkel wrote:
> As some of you may know, I committed my basic-block autovectorization
> pass a few days ago. I encourage anyone interested to try it out (pass
> -vectorize to opt or -mllvm -vectorize to clang) and provide feedback.
> Especially in combination with -unroll-allow-partial, I have observed
> some significant benchmark speedups, but, I have also observed some
> significant slowdowns. I would like to share my thoughts, and hopefully
> get feedback, on next steps.
I haven't had a chance to look at your pass in detail, but here are some opinions: :)
> 1. "Target Data" for vectorization - I think that in order to improve
> the vectorization quality, the vectorizer will need more information
> about the target. This information could be provided in the form of a
> kind of extended target data. This extended target data might contain:
> - What basic types can be vectorized, and how many of them will fit
> into (the largest) vector registers
> - What classes of operations can be vectorized (division, conversions /
> sign extension, etc. are not always supported)
> - What alignment is necessary for loads and stores
> - Is scalar-to-vector free?
I think that this will be a really important API, but I strongly advocate that you model this after TargetLoweringInfo instead of TargetData. First, TargetData isn't actually a target API (it should be fixed, I filed PR11936 to track this). Second, targets will have to implement imperative code to return precise answers to questions. For example, you'll want something like "what is the cost of a shuffle with this mask" which will be extremely target specific, will depend on what CPU subfeatures are enabled, etc.
When you start working on this, I strongly encourage you to propose the API you want here. Start small and add features as you go.
> 2. Feedback between passes - We may to implement a closer coupling
> between optimization passes than currently exists. Specifically, I have
> in mind two things:
> - The vectorizer should communicate more closely with the loop
> unroller. First, the loop unroller should try to unroll to preserve
> maximal load/store alignments. Second, I think it would make a lot of
> sense to be able to unroll and, only if this helps vectorization should
> the unrolled version be kept in preference to the original. With basic
> block vectorization, it is often necessary to (partially) unroll in
> order to vectorize. Even when we also have real loop vectorization,
> however, I still think that it will be important for the loop unroller
> to communicate with the vectorizer.
I really disagree with this, see below.
> 3. Loop vectorization - It would be nice to have, in addition to
> basic-block vectorization, a more-traditional loop vectorization pass. I
> think that we'll need a better loop analysis pass in order for this to
> happen. Some of this was started in LoopDependenceAnalysis, but that
> pass is not yet finished. We'll need something like this to recognize
> affine memory references, etc.
I think that a loop vectorizor and a basic block vectorizer both make perfect sense and are important for different classes of code. However, I don't think that we should go down the path of trying to use a "basic block vectorizor + loop unrolling" serve the purpose of a loop vectorizer. Trying to make a BBVectorizer and a loop unroller play together will be really fragile, because they'll both have to duplicate the same metrics (otherwise, for example, you'd unroll a loop that isn't vectorizable). This will also be a huge hit to compile time.
More information about the llvm-dev