[LLVMdev] Vectorization: Next Steps
hfinkel at anl.gov
Thu Feb 9 08:21:48 PST 2012
On Wed, 2012-02-08 at 17:26 -0800, Chris Lattner wrote:
> On Feb 7, 2012, at 12:10 PM, Hal Finkel wrote:
> >>> 1. "Target Data" for vectorization - I think that in order to improve
> >>> the vectorization quality, the vectorizer will need more information
> >>> about the target. This information could be provided in the form of a
> >>> kind of extended target data. This extended target data might contain:
> >>> - What basic types can be vectorized, and how many of them will fit
> >>> into (the largest) vector registers
> >>> - What classes of operations can be vectorized (division, conversions /
> >>> sign extension, etc. are not always supported)
> >>> - What alignment is necessary for loads and stores
> >>> - Is scalar-to-vector free?
> >> I think that this will be a really important API, but I strongly advocate that you model this after TargetLoweringInfo instead of TargetData. First, TargetData isn't actually a target API (it should be fixed, I filed PR11936 to track this). Second, targets will have to implement imperative code to return precise answers to questions. For example, you'll want something like "what is the cost of a shuffle with this mask" which will be extremely target specific, will depend on what CPU subfeatures are enabled, etc.
> > This makes sense. What do you think will be the best way of
> > synchronizing things like CPU subfeatures between this API and the
> > backend target libraries? They could be linked directly, although I
> > don't know if we want to do that. tablegen could extract a bunch of this
> > information into separate objects that get linked into opt.
> The best model we have at the moment is TargetLoweringInfo, as used by LoopStrengthReduction. The details of this interface aren't a great example to follow for a few reasons (i.e. it has selectiondag specific stuff in it, which is a layering violation) but the idea is sound. This does mean that running "opt -vectorize foo.bc" would not get the same optimization as running clang with the target you want enabled though. We already have this problem with -loop-reduce though.
> >> I think that a loop vectorizor and a basic block vectorizer both make perfect sense and are important for different classes of code. However, I don't think that we should go down the path of trying to use a "basic block vectorizor + loop unrolling" serve the purpose of a loop vectorizer. Trying to make a BBVectorizer and a loop unroller play together will be really fragile, because they'll both have to duplicate the same metrics (otherwise, for example, you'd unroll a loop that isn't vectorizable). This will also be a huge hit to compile time.
> > The only problem with this comes from loops for which unrolling is
> > necessary to expose vectorization because the memory access pattern is
> > too complicated to model in more-traditional loop vectorization. This
> > generally is useful only in cases with a large number of flops per
> > memory operation (or maybe integer ops too, but I have less experience
> > with those), so maybe we can design a useful heuristic to handle those
> > cases. That having been said, unroll+(failed vectorize)+rollback is not
> > really any more expensive at compile time than unroll+(failed vectorize)
> > except that the resulting code would run faster (actually it is cheaper
> > to compile because the optimization/compilation of the unvectorized
> > unrolled loop code takes longer than the non-unrolled loop). There might
> > be a clean way of doing this; I'll think about it.
> I don't really understand the issue here, can you elaborate on when this might be a win? I really don't like "speculatively unroll, try to do something, then reroll". That is terrible for compile time and just strikes me as poor design :-)
>From Ayal's e-mail, it seems that the gcc vectorizer contains
specialized unrolling code to handle these kinds of cases. With
appropriate refactoring, perhaps that is the best solution.
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-dev