[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

Tue Jul 31 13:17:17 PDT 2018

On 31 July 2018 at 21:10, David A. Greene via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> Renato Golin via llvm-dev <llvm-dev at lists.llvm.org> writes:
>
>> Hi David,
>>
>> Let me put the last two comments up:
>>
>>> > But we're trying to represent slightly different techniques
>>> > (predication, vscale change) which need to be tied down to only
>>> > exactly what they do.
>>>
>>> Wouldn't intrinsics to change vscale do exactly that?
>>
>> You're right. I've been using the same overloaded term and this is
>> probably what caused the confusion.
>
> Me too.  Thanks Robin for clarifying this for all of us!  I'll try to
> follow this terminology:
>
> VL/active vector length - The software notion of how many elements to
>                           operate on; a special case of predication
>
> vscale - The hardware notion of how big a vector register is
>
> TL;DR - Changing VL in a function doesn't affect anything about this
>         proposal, but changing vscale might.  Changing VL shouldn't
>         impact things like ISel at all but changing vscale might.
>         Changing vscale is (much) more difficult than changing VL.

Great, seems like we're all in violent agreement that VL changes are a
non-issue for the discussion at hand.

>> In some cases, predicating and shortening the vectors are semantically
>> equivalent. In this case, the IR should also be equivalent.
>> Instructions/intrinsics that handle predication could be used by the
>> backend to simply change VL instead, as long as it's guaranteed that
>> the semantics are identical. There are no problems here.
>
> Right.  Changing VL is no problem.  I think even reducing vscale is ok
> from an IR perspective, if a little strange.
>
>> In other cases, for example widening or splitting the vector, or cases
>> we haven't thought of yet, the semantics are not the same, and having
>> them in IR would be bad. I think we're all in agreements on that.
>
> You mean going from a shorter active vector length to a longer active
> vector length?  Or smaller vscale to larger vscale?  The latter would be
> bad.  The former seems ok if the dataflow is captured and the vectorizer
> generates correct code to account for it.  Presumably it would if it is
> the thing changing the active vector length.
>
>> All I'm asking is that we make a list of what we want to happen and
>> disallow everything else explicitly, until someone comes with a strong
>> case for it. Makes sense?
>
> Yes.
>
>>> Ok, I think I am starting to grasp what you are saying.  If a value
>>> flows from memory or some scalar computation to vector and then back to
>>> memory or scalar, VL should only ever be set at the start of the vector
>>> computation until it finishes and the value is deposited in memory or
>>> otherwise extracted.  I think this is ok, but note that any vector
>>> functions called may change VL for the duration of the call.  The change
>>> would not be visible to the caller.
>>
>> If a function is called and changes the length, does it restore back on return?
>
> If a function changes VL, it would typically restore it before return.
> This would be an ABI guarantee just like any other callee-save register.
>
> If a function changes vscale, I don't know.  The RISC-V people seem to
> have thought the most about this.  I have no point of reference here.
>
>> Right, so it's not as clear cut as I hoped. But we can start
>> implementing the basic idea and then expand as we go. I think trying
>> to hash out all potential scenarios now will drive us crazy.
>
> Sure.
>
>>> It seems strange to me for an optimizer to operate in such a way.  The
>>> optimizer should be fully aware of the target's capabilities and use
>>> them accordingly.
>>
>> Mid-end optimisers tend to be fairly agnostic. And when not, they
>> usually ask "is this supported" instead of "which one is better".
>
> Yes, the "is this supported" question is common.  Isn't the whole point
> of VPlan to get the "which one is better" question answered for
> vectorization?  That would be necessarily tied to the target.  The
> questions asked can be agnostic, like the target-agnostics bits of
> codegen use, but the answers would be target-specific.

Just like the old loop vectorizer, VPlan will need a cost model that
is based on properties of the target, exposed to the optimizer in the
form of e.g. TargetLowering hooks. But we should try really hard to
avoid having a hard distinction between e.g. predication- and VL-based
loops in the VPlan representation. Duplicating or triplicating
vectorization logic would be really bad, and there are a lot of
similarities that we can exploit to avoid that. For a simple example,
SVE and RVV both want the same basic loop skeleton: strip-mining with
predication of the loop body derived from the induction variable.
Hopefully we can have a 99% unified VPlan pipeline and most
differences can be delegated to the final VPlan->IR step and the
respective backends.

+ Diego, Florian and others that have been discussing this previously

>>> ARM seems to have no difficulty selecting instructions for it.  Changing
>>> the value of vscale shouldn't impact ISel at all.  The same instructions
>>> are selected.
>>
>> I may very well be getting lost in too many floating future ideas, atm. :)
>
> Given our clearer terminology, my statement above is maybe not correct.
> Changing vscale *would* impact the IR and codegen (stack allocation,
> etc.).  Changing VL would not, other than adding some Instructions to
> capture the semantics.  I suspect neither would change ISel (I know VL
> would not) but as you say I don't think we need concern ourselves with
> changing vscale right now, unless others have a dire need to support it.
>
>>> > It is, but IIGIR, changing vscale and predicating are similar
>>> > transformations to achieve the similar goals, but will not be
>>> > represented the same way in IR.
>>>
>>> They probably will not be represented the same way, though I think they
>>> could be (but probably shouldn't be).
>>
>> Maybe in the simple cases (like last iteration) they should be?
>
> Perhaps changing VL could be modeled the same way but I have a feeling
> it will be awkward.  Changing vscale is something totally different and
> likely should be represented differently if allowed at all.
>
>>> Ok, but would be optimizer be prevented from introducing VL changes?
>>
>> In the case where they're represented in similar ways in IR, it
>> wouldn't need to.
>
> It would have to generate IR code to effect the software change in VL
> somehow, by altering predicates or by using special instrinsics or some
> other way.
>
>> Otherwise, we'd have to teach the two methods to IR optimisers that
>> are virtually identical in semantics. It'd be left for the back end to
>> implement the last iteration notation as a predicate fill or a vscale
>> change.
>
> I suspect that is too late.  The vectorizer needs to account for the
> choice and pick the most profitable course.  That's one of the reasons I
> think modeling VL changes like predicates is maybe unnecessarily
> complex.  If VL is modeled as "just another predicate" then there's no
> guarantee that ISel will honor the choices the vectorizer made to use VL
> over predication.  If it's modeled explicitly, ISel should have an
> easier time generating the code the vectorizer expects.
>
> VL changes aren't always on the last iteration.  The Cray X1 had an
> instruction (I would have to dust off old manuals to remember the
> mnemonic) with somewhat strange semantics to get the desired VL for an
> iteration.  Code would look something like this:
>
> loop top:
>   vl = getvl N      #  N contains the number of iterations left
>   <do computation>
>   N = N - vl
>   branch N > 0, loop top
>
> The "getvl" instruction would usually return the full hardware vector
> register length (MAXVL), except on the 2nd-to-last iteration if N was
> larger than MAXVL but less than 2*MAXVL it would return something like
> <N % 2 == 0 ? N/2 : N/2 + 1>, so in the range (0, MAXVL).  The last
> iteration would then run at the same VL or one less depending on whether
> N was odd or even.  So the last two iterations would often run at less
> than MAXVL and often at different VLs from each other.

FWIW this is exactly how the RISC-V vector unit works --
unsurprisingly, since it owes a lot to Cray-style processors :)

> And no, I don't know why the hardware operated this way.  :)
>
>>> Being conservative is fine, but we should have a clear understanding of
>>> exactly what that means.  I would not want to prohibit all VL changes
>>> now and forever, because I see that as unnecessarily restrictive and
>>> possibly damaging to supporting future architectures.
>>>
>>> If we don't want to provide intrinsics for changing VL right now, I'm
>>> all in favor.  There would be no reason to add error checks because
>>> there would be no way within the IR to change VL.
>>
>> Right, I think we're converging.
>
> Agreed.

+1, there is no need to deal with VL at all at this point. I would
even say there isn't even any concept of VL in IR at all at this time.

At some point in the future I will propose something in this space to
support RISC-V vectors, but we'll cross that bridge when we come to
it.

>> How about we don't forbid changes in vscale, but we find a common
>> notation for all the cases where predicating and changing vscale would
>> be semantically identical, and implement those in the same way.
>>
>> Later on, if there are additional cases where changes in vscale would
>> be beneficial, we can discuss them independently.
>>
>> Makes sense?
>
> Again trying to use the VL/vscale terminology:
>
> Changing vscale - no IR support currently and less likely in the future
> Changing VL     - no IR support currently but more likely in the future
>
> The second seems like a straightforward extension to me.  There will be
> some questions about how to represent VL semantics in IR but those don't
> impact the proposal under discussion at all.
>
> The first seems much harder, at least within a function.  It may or may
> not impact the proposal under discussion.  It sounds like the RISC-V
> people have some use cases so those should probably be the focal point
> of this discussion.

Yes, for RISC-V we definitely need vscale to vary a bit, but are fine
with limiting that to function boundaries. The use case is *not*
"changing how large vectors are" in the middle of a loop or something
like that, which we all agree is very dubious at best. The RISC-V
vector unit is just very configurable (number of registers, vector
element sizes, etc.) and this configuration can impact how large the
vector registers are. For any given vectorized loop next we want to
configure the vector unit to suit that piece of code and run the loop
with whatever register size that configuration yields. And when that
loop is done, we stop using the vector unit entirely and disable it,
so that the next loop can use it differently, possibly with a
different register size. For IR modeling purposes, I propose to
enlarge "loop nest" to "function" but the same principle applies, it
just means all vectorized loops in the function will have to share a
configuration.

Without getting too far into the details, does this make sense as a use case?

Cheers,
Robin

>                            -David
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev