[llvm-dev] [RFC] Allow loop vectorizer to choose vector widths that generate illegal types
Martin J. O'Riordan via llvm-dev
llvm-dev at lists.llvm.org
Thu Jun 16 01:02:01 PDT 2016
Our architecture has 2 different sizes for vector registers with separate register files and functional units for each, and the existing cost model already makes optimisation for this quite difficult. Ideally the loop-vectoriser would be able to vectorise for vectorisable code in the loop using both in parallel. At the moment the architectures that in the TRUNK for LLVM all use a single size for vector registers and a single register file for them, but I expect there are other out-of-tree targets that are using multiple vector register widths.
Removing the width limitation altogether I think would make optimisations for hybrid vector models such as ours less difficult, but it also means the cost model should be able to query for the vector width and expect to get a list instead of a single value as it does now. Querying for the number of vector registers should be a function of the vector type being examined.
From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Michael Kuperstein via llvm-dev
Sent: 15 June 2016 23:48
To: Hal Finkel <hfinkel at anl.gov>; Nadav Rotem <nadav.rotem at me.com>; Ayal Zaks <ayal.zaks at intel.com>; Demikhovsky, Elena <elena.demikhovsky at intel.com>; Adam Nemet <anemet at apple.com>; Sanjoy Das <sanjoy at playingwithpointers.com>; James Molloy <james.molloy at arm.com>; Matthew Simpson <mssimpso at codeaurora.org>; Sanjay Patel <spatel at rotateright.com>; Chandler Carruth <chandlerc at google.com>; David Li <davidxl at google.com>; Wei Mi <wmi at google.com>; Dehao Chen <dehao at google.com>; Cong Hou <congh at google.com>
Cc: Llvm Dev <llvm-dev at lists.llvm.org>
Subject: [llvm-dev] [RFC] Allow loop vectorizer to choose vector widths that generate illegal types
Currently the loop vectorizer will, by default, not consider vectorization factors that would make it generate types that do not fit into the target platform's vector registers. That is, if the widest scalar type in the scalar loop is i64, and the platform's largest vector register is 256-bit wide, we will not consider a VF above 4.
We have a command line option (-mllvm -vectorizer-maximize-bandwidth), that will choose VFs for consideration based on the narrowest scalar type instead of the widest one, but I don't believe it has been widely tested. If anyone has had an opportunity to play around with it, I'd love to hear about the results.
What I'd like to do is:
Step 1: Make -vectorizer-maximize-bandwidth the default. This should improve the performance of loops that contain mixed-width types.
Step 2: Remove the artificial width limitation altogether, and base the vectorization factor decision purely on the cost model. This should allow us to get rid of the interleaving code in the loop vectorizer, and get interleaving for "free" from the legalizer instead.
There are two potential road-blocks I see - the cost-model, and the legalizer. To make this work, we need to:
a) Model the cost of operations on illegal types better. Right now, what we get is sometimes completely ridiculous (e.g. see http://reviews.llvm.org/D21251).
b) Make sure the cost model actually stops us when the VF becomes too large. This is mostly a question of correctly estimating the register pressure. In theory, that should not be a issue - we already rely on this estimate to choose the interleaving factor, so using the same logic to upper-bound the VF directly shouldn't make things worse.
c) Ensure the legalizer is up to the task of emitting good code for overly wide vectors. I've talked about this with Chandler, and his opinion (Chandler, please correct me if I'm wrong) is that on x86, the legalizer is likely to be able to handle this. This may not be true for other platforms. So, I'd like to try to make this the default on a platform-by-platform basis, starting with x86.
What do you think? Does this seem like a step in the right direction? Anything important I'm missing?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev