[llvm-dev] [RFC] Allow loop vectorizer to choose vector widths that generate illegal types

Wed Jun 15 16:00:16 PDT 2016

I know we already talked about this and so I'm more interested in others'
thoughts, but just to explicitly say it, this LGTM. I particularly think
that using extra-wide vectors to model widening-for-interleaving is a much
cleaner model in the IR.

Also, at least one other user of the IR's vector capabilities is doing
precisely this: Halide. I'm pretty happy about seeing convergence here and
both Halide and the loop vectorizer generating more similar patterns.

On Wed, Jun 15, 2016 at 3:48 PM Michael Kuperstein <mkuper at google.com>
wrote:

> Hello,
>
> Currently the loop vectorizer will, by default, not consider vectorization
> factors that would make it generate types that do not fit into the target
> platform's vector registers. That is, if the widest scalar type in the
> scalar loop is i64, and the platform's largest vector register is 256-bit
> wide, we will not consider a VF above 4.
>
> We have a command line option (-mllvm -vectorizer-maximize-bandwidth),
> that will choose VFs for consideration based on the narrowest scalar type
> instead of the widest one, but I don't believe it has been widely tested.
> If anyone has had an opportunity to play around with it, I'd love to hear
> about the results.
>
> What I'd like to do is:
> Step 1: Make -vectorizer-maximize-bandwidth the default. This should
> improve the performance of loops that contain mixed-width types.
> Step 2: Remove the artificial width limitation altogether, and base the
> vectorization factor decision purely on the cost model. This should allow
> us to get rid of the interleaving code in the loop vectorizer, and get
> interleaving for "free" from the legalizer instead.
>
> There are two potential road-blocks I see - the cost-model, and the
> legalizer. To make this work, we need to:
> a) Model the cost of operations on illegal types better. Right now, what
> we get is sometimes completely ridiculous (e.g. see
> http://reviews.llvm.org/D21251).
> b) Make sure the cost model actually stops us when the VF becomes too
> large. This is mostly a question of correctly estimating the register
> pressure. In theory, that should not be a issue - we already rely on this
> estimate to choose the interleaving factor, so using the same logic to
> upper-bound the VF directly shouldn't make things worse.
> c) Ensure the legalizer is up to the task of emitting good code for overly
> wide vectors. I've talked about this with Chandler, and his opinion
> (Chandler, please correct me if I'm wrong) is that on x86, the legalizer is
> likely to be able to handle this. This may not be true for other platforms.
> So, I'd like to try to make this the default on a platform-by-platform
> basis, starting with x86.
>
> What do you think? Does this seem like a step in the right direction?
> Anything important I'm missing?
>
> Thanks,
>   Michael
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160615/9043aea1/attachment.html>