[LLVMdev] Vectorization: Next Steps

Mon Feb 13 21:04:10 PST 2012

On Feb 9, 2012, at 8:21 AM, Hal Finkel wrote:
>>> The only problem with this comes from loops for which unrolling is
>>> necessary to expose vectorization because the memory access pattern is
>>> too complicated to model in more-traditional loop vectorization. This
>>> generally is useful only in cases with a large number of flops per
>>> memory operation (or maybe integer ops too, but I have less experience
>>> with those), so maybe we can design a useful heuristic to handle those
>>> cases. That having been said, unroll+(failed vectorize)+rollback is not
>>> really any more expensive at compile time than unroll+(failed vectorize)
>>> except that the resulting code would run faster (actually it is cheaper
>>> to compile because the optimization/compilation of the unvectorized
>>> unrolled loop code takes longer than the non-unrolled loop). There might
>>> be a clean way of doing this; I'll think about it.
>> 
>> I don't really understand the issue here, can you elaborate on when this might be a win?  I really don't like "speculatively unroll, try to do something, then reroll".  That is terrible for compile time and just strikes me as poor design :-)
> 
> From Ayal's e-mail, it seems that the gcc vectorizer contains
> specialized unrolling code to handle these kinds of cases. With
> appropriate refactoring, perhaps that is the best solution.

Sure, I'm fully aware that a lot of compilers use the unroll-and-vectorize approach.  However, just because other compiler's do it, doesn't mean it isn't a total hack ;-).  Is there a principled reason that unrolling is a better approach?

-Chris