[LLVMdev] Vectorization: Next Steps

Roel Jordans r.jordans at tue.nl
Thu Feb 9 02:04:10 PST 2012



On 02/09/2012 02:26 AM, Chris Lattner wrote:
>>> I think that a loop vectorizor and a basic block vectorizer both make perfect sense and are important for different classes of code.  However, I don't think that we should go down the path of trying to use a "basic block vectorizor + loop unrolling" serve the purpose of a loop vectorizer.  Trying to make a BBVectorizer and a loop unroller play together will be really fragile, because they'll both have to duplicate the same metrics (otherwise, for example, you'd unroll a loop that isn't vectorizable).  This will also be a huge hit to compile time.
>>
>> The only problem with this comes from loops for which unrolling is
>> necessary to expose vectorization because the memory access pattern is
>> too complicated to model in more-traditional loop vectorization. This
>> generally is useful only in cases with a large number of flops per
>> memory operation (or maybe integer ops too, but I have less experience
>> with those), so maybe we can design a useful heuristic to handle those
>> cases. That having been said, unroll+(failed vectorize)+rollback is not
>> really any more expensive at compile time than unroll+(failed vectorize)
>> except that the resulting code would run faster (actually it is cheaper
>> to compile because the optimization/compilation of the unvectorized
>> unrolled loop code takes longer than the non-unrolled loop). There might
>> be a clean way of doing this; I'll think about it.
>
> I don't really understand the issue here, can you elaborate on when this might be a win?  I really don't like "speculatively unroll, try to do something, then reroll".  That is terrible for compile time and just strikes me as poor design :-)
>

This seems a bit related to Resource-Directed Loop Pipelining [1] to me. 
RDLP uses loop unrolling in combination with loop shifting (or peeling) 
to map a loop-body to a parallel architecture. It was originally focused 
on VLIW like parallelism but I think that a similar technique may be 
useful for vectorization.

Cheers,
Roel

[1] http://comjnl.oxfordjournals.org/content/40/6/311.short



More information about the llvm-dev mailing list