[llvm-dev] LoopVectorizer: Should the cost-model be used for legalisation?
Sander De Smalen via llvm-dev
llvm-dev at lists.llvm.org
Thu Jun 10 13:50:41 PDT 2021
Last year we added the InstructionCost class which adds the ability to
represent that an operation cannot be costed, i.e. operations that cannot
be expanded by the code-generator will have an invalid cost.
We started using this information in the Loop Vectorizer for scalable
auto-vectorization. The LV has a legality- and a cost-model stage, which are
conceptually separate concepts with different purposes. But with the
introduction of having valid/invalid costs it's more inviting to use the
cost-model as 'legalisation', which leads us to the following question:
Should we be using the cost-model to do legalisation?
'Legalisation' in this context means asking the question beforehand if the
code-generator can handle the IR emitted from the LV. Examples of
operations that need such legalisation are predicated divides (at least
until we can use the llvm.vp intrinsics), or intrinsic calls that have no
scalable-vector equivalent. For fixed-width vectors this legalisation issue
is mostly moot, since operations on fixed-width vectors can be scalarised.
For scalable vectors this is neither supported nor feasible .
This means there's the option to do one of two things:
Add checks to the LV legalisation to see if scalable-vectorisation is
feasible. If so, assert the cost must be valid. Otherwise discard scalable
VFs as possible candidates.
* This has the benefit that the compiler can avoid
calculating/considering VPlans that we know cannot be costed.
* Legalisation and cost-model keep each other in check. If something
cannot be costed then either the cost-model or legalisation was
Leave the question about legalisation to the CostModel, i.e. if the
CostModel says that <operation> for `VF=vscale x N` is Invalid, then avoid
selecting that VF.
* This has the benefit that we don't need to do work up-front to
discard scalable VFs, keeping the LV design simpler.
* This makes gaps in the cost-model more difficult to spot.
Note that it's not useful to combine Option 1 and Option 2, because having
two ways to choose from takes away the need to do legalisation beforehand,
and so that's basically a choice for Option 2.
Both approaches lead to the same end-result, but we currently have a few
patches in flight that have taken Option 1, and this led to some questions
about the approach from both Florian and David Green. So we're looking to
reach to a consensus and decision on what way to move forward.
I've tentatively added this as a topic to the agenda of the upcoming LLVM
SVE/Scalable Vector Sync-up meeting next Tuesday (June 15th, ) as an
opportunity to discuss this more freely if we can get enough people who
actively work on the LV together in that meeting (like Florian and David,
although please forward to anyone else who might have input on this).
 Expanding the vector operation into a scalarisation loop is currently
not supported. It could be done, but we have done extensive
experimentation with loops that handle each element of a scalable
vector sequentially, but this has never proved beneficial, even when
using special instructions to efficiently increment the predicate
vector. I doubt this will be any different for other scalable vector
architectures, because of the loop control overhead. Also the
insertion/extraction of elements from a scalable vector is unlikely to
be as cheap as for fixed-width vectors.
More information about the llvm-dev