[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

Tue Jul 31 12:10:18 PDT 2018

Renato Golin via llvm-dev <llvm-dev at lists.llvm.org> writes:

> Hi David,
>
> Let me put the last two comments up:
>
>> > But we're trying to represent slightly different techniques
>> > (predication, vscale change) which need to be tied down to only
>> > exactly what they do.
>>
>> Wouldn't intrinsics to change vscale do exactly that?
>
> You're right. I've been using the same overloaded term and this is
> probably what caused the confusion.

Me too.  Thanks Robin for clarifying this for all of us!  I'll try to
follow this terminology:

VL/active vector length - The software notion of how many elements to
                          operate on; a special case of predication

vscale - The hardware notion of how big a vector register is

TL;DR - Changing VL in a function doesn't affect anything about this
        proposal, but changing vscale might.  Changing VL shouldn't
        impact things like ISel at all but changing vscale might.
        Changing vscale is (much) more difficult than changing VL.

> In some cases, predicating and shortening the vectors are semantically
> equivalent. In this case, the IR should also be equivalent.
> Instructions/intrinsics that handle predication could be used by the
> backend to simply change VL instead, as long as it's guaranteed that
> the semantics are identical. There are no problems here.

Right.  Changing VL is no problem.  I think even reducing vscale is ok
from an IR perspective, if a little strange.

> In other cases, for example widening or splitting the vector, or cases
> we haven't thought of yet, the semantics are not the same, and having
> them in IR would be bad. I think we're all in agreements on that.

You mean going from a shorter active vector length to a longer active
vector length?  Or smaller vscale to larger vscale?  The latter would be
bad.  The former seems ok if the dataflow is captured and the vectorizer
generates correct code to account for it.  Presumably it would if it is
the thing changing the active vector length.

> All I'm asking is that we make a list of what we want to happen and
> disallow everything else explicitly, until someone comes with a strong
> case for it. Makes sense?

Yes.

>> Ok, I think I am starting to grasp what you are saying.  If a value
>> flows from memory or some scalar computation to vector and then back to
>> memory or scalar, VL should only ever be set at the start of the vector
>> computation until it finishes and the value is deposited in memory or
>> otherwise extracted.  I think this is ok, but note that any vector
>> functions called may change VL for the duration of the call.  The change
>> would not be visible to the caller.
>
> If a function is called and changes the length, does it restore back on return?

If a function changes VL, it would typically restore it before return.
This would be an ABI guarantee just like any other callee-save register.

If a function changes vscale, I don't know.  The RISC-V people seem to
have thought the most about this.  I have no point of reference here.

> Right, so it's not as clear cut as I hoped. But we can start
> implementing the basic idea and then expand as we go. I think trying
> to hash out all potential scenarios now will drive us crazy.

Sure.

>> It seems strange to me for an optimizer to operate in such a way.  The
>> optimizer should be fully aware of the target's capabilities and use
>> them accordingly.
>
> Mid-end optimisers tend to be fairly agnostic. And when not, they
> usually ask "is this supported" instead of "which one is better".

Yes, the "is this supported" question is common.  Isn't the whole point
of VPlan to get the "which one is better" question answered for
vectorization?  That would be necessarily tied to the target.  The
questions asked can be agnostic, like the target-agnostics bits of
codegen use, but the answers would be target-specific.

>> ARM seems to have no difficulty selecting instructions for it.  Changing
>> the value of vscale shouldn't impact ISel at all.  The same instructions
>> are selected.
>
> I may very well be getting lost in too many floating future ideas, atm. :)

Given our clearer terminology, my statement above is maybe not correct.
Changing vscale *would* impact the IR and codegen (stack allocation,
etc.).  Changing VL would not, other than adding some Instructions to
capture the semantics.  I suspect neither would change ISel (I know VL
would not) but as you say I don't think we need concern ourselves with
changing vscale right now, unless others have a dire need to support it.

>> > It is, but IIGIR, changing vscale and predicating are similar
>> > transformations to achieve the similar goals, but will not be
>> > represented the same way in IR.
>>
>> They probably will not be represented the same way, though I think they
>> could be (but probably shouldn't be).
>
> Maybe in the simple cases (like last iteration) they should be?

Perhaps changing VL could be modeled the same way but I have a feeling
it will be awkward.  Changing vscale is something totally different and
likely should be represented differently if allowed at all.

>> Ok, but would be optimizer be prevented from introducing VL changes?
>
> In the case where they're represented in similar ways in IR, it
> wouldn't need to.

It would have to generate IR code to effect the software change in VL
somehow, by altering predicates or by using special instrinsics or some
other way.

> Otherwise, we'd have to teach the two methods to IR optimisers that
> are virtually identical in semantics. It'd be left for the back end to
> implement the last iteration notation as a predicate fill or a vscale
> change.

I suspect that is too late.  The vectorizer needs to account for the
choice and pick the most profitable course.  That's one of the reasons I
think modeling VL changes like predicates is maybe unnecessarily
complex.  If VL is modeled as "just another predicate" then there's no
guarantee that ISel will honor the choices the vectorizer made to use VL
over predication.  If it's modeled explicitly, ISel should have an
easier time generating the code the vectorizer expects.

VL changes aren't always on the last iteration.  The Cray X1 had an
instruction (I would have to dust off old manuals to remember the
mnemonic) with somewhat strange semantics to get the desired VL for an
iteration.  Code would look something like this:

loop top:
  vl = getvl N      #  N contains the number of iterations left
  <do computation>
  N = N - vl
  branch N > 0, loop top

The "getvl" instruction would usually return the full hardware vector
register length (MAXVL), except on the 2nd-to-last iteration if N was
larger than MAXVL but less than 2*MAXVL it would return something like
<N % 2 == 0 ? N/2 : N/2 + 1>, so in the range (0, MAXVL).  The last
iteration would then run at the same VL or one less depending on whether
N was odd or even.  So the last two iterations would often run at less
than MAXVL and often at different VLs from each other.

And no, I don't know why the hardware operated this way.  :)

>> Being conservative is fine, but we should have a clear understanding of
>> exactly what that means.  I would not want to prohibit all VL changes
>> now and forever, because I see that as unnecessarily restrictive and
>> possibly damaging to supporting future architectures.
>>
>> If we don't want to provide intrinsics for changing VL right now, I'm
>> all in favor.  There would be no reason to add error checks because
>> there would be no way within the IR to change VL.
>
> Right, I think we're converging.

Agreed.

> How about we don't forbid changes in vscale, but we find a common
> notation for all the cases where predicating and changing vscale would
> be semantically identical, and implement those in the same way.
>
> Later on, if there are additional cases where changes in vscale would
> be beneficial, we can discuss them independently.
>
> Makes sense?

Again trying to use the VL/vscale terminology:

Changing vscale - no IR support currently and less likely in the future
Changing VL     - no IR support currently but more likely in the future

The second seems like a straightforward extension to me.  There will be
some questions about how to represent VL semantics in IR but those don't
impact the proposal under discussion at all.

The first seems much harder, at least within a function.  It may or may
not impact the proposal under discussion.  It sounds like the RISC-V
people have some use cases so those should probably be the focal point
of this discussion.

                           -David