[llvm-dev] [RFC] Allow loop vectorizer to choose vector widths that generate illegal types
Michael Kuperstein via llvm-dev
llvm-dev at lists.llvm.org
Wed Jun 15 16:25:41 PDT 2016
If anyone wants to volunteer to test this on their workloads once the cost
model is less broken (so we actually try to use higher VFs instead of
rejecting them on cost grounds), that would be great.
On Wed, Jun 15, 2016 at 4:12 PM, Xinliang David Li <davidxl at google.com>
> Michael, thanks for driving this! My only comment is that before the
> final flip, we need to engage the community for more extensive performance
> testing on various architectures.
> On Wed, Jun 15, 2016 at 3:47 PM, Michael Kuperstein <mkuper at google.com>
>> Currently the loop vectorizer will, by default, not consider
>> vectorization factors that would make it generate types that do not fit
>> into the target platform's vector registers. That is, if the widest scalar
>> type in the scalar loop is i64, and the platform's largest vector register
>> is 256-bit wide, we will not consider a VF above 4.
>> We have a command line option (-mllvm -vectorizer-maximize-bandwidth),
>> that will choose VFs for consideration based on the narrowest scalar type
>> instead of the widest one, but I don't believe it has been widely tested.
>> If anyone has had an opportunity to play around with it, I'd love to hear
>> about the results.
>> What I'd like to do is:
>> Step 1: Make -vectorizer-maximize-bandwidth the default. This should
>> improve the performance of loops that contain mixed-width types.
>> Step 2: Remove the artificial width limitation altogether, and base the
>> vectorization factor decision purely on the cost model. This should allow
>> us to get rid of the interleaving code in the loop vectorizer, and get
>> interleaving for "free" from the legalizer instead.
>> There are two potential road-blocks I see - the cost-model, and the
>> legalizer. To make this work, we need to:
>> a) Model the cost of operations on illegal types better. Right now, what
>> we get is sometimes completely ridiculous (e.g. see
>> b) Make sure the cost model actually stops us when the VF becomes too
>> large. This is mostly a question of correctly estimating the register
>> pressure. In theory, that should not be a issue - we already rely on this
>> estimate to choose the interleaving factor, so using the same logic to
>> upper-bound the VF directly shouldn't make things worse.
>> c) Ensure the legalizer is up to the task of emitting good code for
>> overly wide vectors. I've talked about this with Chandler, and his opinion
>> (Chandler, please correct me if I'm wrong) is that on x86, the legalizer is
>> likely to be able to handle this. This may not be true for other platforms.
>> So, I'd like to try to make this the default on a platform-by-platform
>> basis, starting with x86.
>> What do you think? Does this seem like a step in the right direction?
>> Anything important I'm missing?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev