[LLVMdev] Vectorization: Next Steps

Wed Feb 8 17:26:45 PST 2012

On Feb 7, 2012, at 12:10 PM, Hal Finkel wrote:
>>> 1. "Target Data" for vectorization - I think that in order to improve
>>> the vectorization quality, the vectorizer will need more information
>>> about the target. This information could be provided in the form of a
>>> kind of extended target data. This extended target data might contain:
>>> - What basic types can be vectorized, and how many of them will fit
>>> into (the largest) vector registers
>>> - What classes of operations can be vectorized (division, conversions /
>>> sign extension, etc. are not always supported)
>>> - What alignment is necessary for loads and stores
>>> - Is scalar-to-vector free?
>> 
>> I think that this will be a really important API, but I strongly advocate that you model this after TargetLoweringInfo instead of TargetData.  First, TargetData isn't actually a target API (it should be fixed, I filed PR11936 to track this).  Second, targets will have to implement imperative code to return precise answers to questions.  For example, you'll want something like "what is the cost of a shuffle with this mask" which will be extremely target specific, will depend on what CPU subfeatures are enabled, etc.
> 
> This makes sense. What do you think will be the best way of
> synchronizing things like CPU subfeatures between this API and the
> backend target libraries? They could be linked directly, although I
> don't know if we want to do that. tablegen could extract a bunch of this
> information into separate objects that get linked into opt.

The best model we have at the moment is TargetLoweringInfo, as used by LoopStrengthReduction.  The details of this interface aren't a great example to follow for a few reasons (i.e. it has selectiondag specific stuff in it, which is a layering violation) but the idea is sound.  This does mean that running "opt -vectorize foo.bc" would not get the same optimization as running clang with the target you want enabled though.  We already have this problem with -loop-reduce though.

>> I think that a loop vectorizor and a basic block vectorizer both make perfect sense and are important for different classes of code.  However, I don't think that we should go down the path of trying to use a "basic block vectorizor + loop unrolling" serve the purpose of a loop vectorizer.  Trying to make a BBVectorizer and a loop unroller play together will be really fragile, because they'll both have to duplicate the same metrics (otherwise, for example, you'd unroll a loop that isn't vectorizable).  This will also be a huge hit to compile time.
> 
> The only problem with this comes from loops for which unrolling is
> necessary to expose vectorization because the memory access pattern is
> too complicated to model in more-traditional loop vectorization. This
> generally is useful only in cases with a large number of flops per
> memory operation (or maybe integer ops too, but I have less experience
> with those), so maybe we can design a useful heuristic to handle those
> cases. That having been said, unroll+(failed vectorize)+rollback is not
> really any more expensive at compile time than unroll+(failed vectorize)
> except that the resulting code would run faster (actually it is cheaper
> to compile because the optimization/compilation of the unvectorized
> unrolled loop code takes longer than the non-unrolled loop). There might
> be a clean way of doing this; I'll think about it.

I don't really understand the issue here, can you elaborate on when this might be a win?  I really don't like "speculatively unroll, try to do something, then reroll".  That is terrible for compile time and just strikes me as poor design :-)

-Chris