[LLVMdev] Vectorization: Next Steps

Thu Feb 9 08:21:48 PST 2012

On Wed, 2012-02-08 at 17:26 -0800, Chris Lattner wrote:
> On Feb 7, 2012, at 12:10 PM, Hal Finkel wrote:
> >>> 1. "Target Data" for vectorization - I think that in order to improve
> >>> the vectorization quality, the vectorizer will need more information
> >>> about the target. This information could be provided in the form of a
> >>> kind of extended target data. This extended target data might contain:
> >>> - What basic types can be vectorized, and how many of them will fit
> >>> into (the largest) vector registers
> >>> - What classes of operations can be vectorized (division, conversions /
> >>> sign extension, etc. are not always supported)
> >>> - What alignment is necessary for loads and stores
> >>> - Is scalar-to-vector free?
> >> 
> >> I think that this will be a really important API, but I strongly advocate that you model this after TargetLoweringInfo instead of TargetData.  First, TargetData isn't actually a target API (it should be fixed, I filed PR11936 to track this).  Second, targets will have to implement imperative code to return precise answers to questions.  For example, you'll want something like "what is the cost of a shuffle with this mask" which will be extremely target specific, will depend on what CPU subfeatures are enabled, etc.
> > 
> > This makes sense. What do you think will be the best way of
> > synchronizing things like CPU subfeatures between this API and the
> > backend target libraries? They could be linked directly, although I
> > don't know if we want to do that. tablegen could extract a bunch of this
> > information into separate objects that get linked into opt.
> 
> The best model we have at the moment is TargetLoweringInfo, as used by LoopStrengthReduction.  The details of this interface aren't a great example to follow for a few reasons (i.e. it has selectiondag specific stuff in it, which is a layering violation) but the idea is sound.  This does mean that running "opt -vectorize foo.bc" would not get the same optimization as running clang with the target you want enabled though.  We already have this problem with -loop-reduce though.
> 
> >> I think that a loop vectorizor and a basic block vectorizer both make perfect sense and are important for different classes of code.  However, I don't think that we should go down the path of trying to use a "basic block vectorizor + loop unrolling" serve the purpose of a loop vectorizer.  Trying to make a BBVectorizer and a loop unroller play together will be really fragile, because they'll both have to duplicate the same metrics (otherwise, for example, you'd unroll a loop that isn't vectorizable).  This will also be a huge hit to compile time.
> > 
> > The only problem with this comes from loops for which unrolling is
> > necessary to expose vectorization because the memory access pattern is
> > too complicated to model in more-traditional loop vectorization. This
> > generally is useful only in cases with a large number of flops per
> > memory operation (or maybe integer ops too, but I have less experience
> > with those), so maybe we can design a useful heuristic to handle those
> > cases. That having been said, unroll+(failed vectorize)+rollback is not
> > really any more expensive at compile time than unroll+(failed vectorize)
> > except that the resulting code would run faster (actually it is cheaper
> > to compile because the optimization/compilation of the unvectorized
> > unrolled loop code takes longer than the non-unrolled loop). There might
> > be a clean way of doing this; I'll think about it.
> 
> I don't really understand the issue here, can you elaborate on when this might be a win?  I really don't like "speculatively unroll, try to do something, then reroll".  That is terrible for compile time and just strikes me as poor design :-)

>From Ayal's e-mail, it seems that the gcc vectorizer contains
specialized unrolling code to handle these kinds of cases. With
appropriate refactoring, perhaps that is the best solution.

 -Hal

> 
> -Chris
> 

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory