[llvm-dev] LoopVectorizer: Should the cost-model be used for legalisation?

Thu Jun 10 13:50:41 PDT 2021

Hi,

Last year we added the InstructionCost class which adds the ability to
represent that an operation cannot be costed, i.e. operations that cannot
be expanded by the code-generator will have an invalid cost.

We started using this information in the Loop Vectorizer for scalable
auto-vectorization. The LV has a legality- and a cost-model stage, which are 
conceptually separate concepts with different purposes. But with the 
introduction of having valid/invalid costs it's more inviting to use the 
cost-model as 'legalisation', which leads us to the following question:

   Should we be using the cost-model to do legalisation?

'Legalisation' in this context means asking the question beforehand if the 
code-generator can handle the IR emitted from the LV. Examples of
operations that need such legalisation are predicated divides (at least
until we can use the llvm.vp intrinsics), or intrinsic calls that have no
scalable-vector equivalent. For fixed-width vectors this legalisation issue
is mostly moot, since operations on fixed-width vectors can be scalarised.
For scalable vectors this is neither supported nor feasible [1].

This means there's the option to do one of two things:

[Option 1]

Add checks to the LV legalisation to see if scalable-vectorisation is
feasible. If so, assert the cost must be valid. Otherwise discard scalable
VFs as possible candidates.
 * This has the benefit that the compiler can avoid
   calculating/considering VPlans that we know cannot be costed.
 * Legalisation and cost-model keep each other in check. If something
   cannot be costed then either the cost-model or legalisation was
   incomplete.

[Option 2]

Leave the question about legalisation to the CostModel, i.e. if the
CostModel says that <operation> for `VF=vscale x N` is Invalid, then avoid
selecting that VF.
 * This has the benefit that we don't need to do work up-front to
   discard scalable VFs, keeping the LV design simpler.
 * This makes gaps in the cost-model more difficult to spot.

Note that it's not useful to combine Option 1 and Option 2, because having
two ways to choose from takes away the need to do legalisation beforehand,
and so that's basically a choice for Option 2.

Both approaches lead to the same end-result, but we currently have a few
patches in flight that have taken Option 1, and this led to some questions
about the approach from both Florian and David Green. So we're looking to
reach to a consensus and decision on what way to move forward.

I've tentatively added this as a topic to the agenda of the upcoming LLVM
SVE/Scalable Vector Sync-up meeting next Tuesday (June 15th, [2]) as an
opportunity to discuss this more freely if we can get enough people who
actively work on the LV together in that meeting (like Florian and David,
although please forward to anyone else who might have input on this).

Thanks,

Sander

[1] Expanding the vector operation into a scalarisation loop is currently
    not supported. It could be done, but we have done extensive
    experimentation with loops that handle each element of a scalable
    vector sequentially, but this has never proved beneficial, even when
    using special instructions to efficiently increment the predicate
    vector. I doubt this will be any different for other scalable vector
    architectures, because of the loop control overhead. Also the 
    insertion/extraction of elements from a scalable vector is unlikely to
    be as cheap as for fixed-width vectors.

[2] https://docs.google.com/document/d/1UPH2Hzou5RgGT8XfO39OmVXKEibWPfdYLELSaHr3xzo/edit?usp=sharing