[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

Thu Jun 7 15:31:43 PDT 2018

On Fri, Jun 8, 2018 at 4:10 AM, Graham Hunter via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi,
>
> > On 6 Jun 2018, at 17:36, David A. Greene <dag at cray.com> wrote:
> >
> > Graham Hunter via llvm-dev <llvm-dev at lists.llvm.org> writes:
> >
> >>> Ok, now I understand what you're getting at.  A ConstantExpr would
> >>> encapsulate this computation.  We alreay have "non-static-constant"
> >>> values for ConstantExpr like sizeof and offsetof.  I would see
> >>> VScaleConstant in that same tradition.  In your struct example,
> >>> getSizeExpressionInBits would return:
> >>>
> >>> add(mul(256, vscale), 64)
> >>>
> >>> Does that satisfy your needs?
> >>
> >> Ah, I think the use of 'expression' in the name definitely confuses the
> issue then. This
> >> isn't for expressing the size in IR, where you would indeed just
> multiply by vscale and
> >> add any fixed-length size.
> >
> > Ok, thanks for clarifying.  The use of "expression" is confusing.
> >
> >> This is for the analysis code around the IR -- lots of code asks for
> the size of a Type in
> >> bits to determine what it can do to a Value with that type. Some of
> them are specific to
> >> scalar Types, like determining whether a sign/zero extend is needed.
> Others would
> >> apply to vector types (including scalable vectors), such as checking
> whether two
> >> Types have the exact same size so that a bitcast can be used instead of
> a more
> >> expensive operation like copying to memory and back to convert.
> >
> > If this method returns two integers, how does LLVM interpret the
> > comparison?  If the return value is { <unscaled>, <scaled> } then how
> > do, say { 1024, 0 } and { 0, 128 } compare?  Doesn't it depend on the
> > vscale?  They could be the same size or not, depending on the target
> > characteristics.
>
> I did have a paragraph on that in the RFC, but perhaps a list would be
> a better format (assuming X,Y,etc are non-zero):
>
> { X, 0 } <cmp> { Y, 0 }: Normal unscaled comparison.
>
> { 0, X } <cmp> { 0, Y }: Normal comparison within a function, or across
>                          functions that inherit vector length. Cannot be
>                          compared across non-inheriting functions.
>
> { X, 0 } > { 0, Y }: Cannot return true.
>
> { X, 0 } = { 0, Y }: Cannot return true.
>
> { X, 0 } < { 0, Y }: Can return true.
>
> { Xu, Xs } <cmp> { Yu, Ys }: Gets complicated, need to subtract common
>                              terms and try the above comparisons; it
>                              may not be possible to get a good answer.
>
> I don't know if we need a 'maybe' result for cases comparing scaled
> vs. unscaled; I believe the gcc implementation of SVE allows for such
> results, but that supports a generic polynomial length representation.
>
> I think in code, we'd have an inline function to deal with the first case
> and an likely-not-taken call to a separate function to handle all the
> scalable cases.
>
> > Are bitcasts between scaled types and non-scaled types disallowed?  I
> > could certainly see an argument for disallowing it.  I could argue that
> > for bitcasting purposes that the unscaled and scaled parts would have to
> > exactly match in order to do a legal bitcast.  Is that the intent?
>
> I would propose disallowing bitcasts, but allowing extracting a subvector
> if the minimum number of scaled bits matches the number of unscaled bits.
>
> >
> >>> Is there anything about vscale or a scalable vector that requires a
> >>> minimum bit width?  For example, is this legal?
> >>>
> >>> <scalable 1 x double>
> >>>
> >>> I know it won't map to an SVE type.  I'm simply curious because
> >>> traditionally Cray machines defined vectors in terms of
> >>> machine-dependent "maxvl" with an element type, so with the above
> vscale
> >>> would == maxvl.  Not that we make any such things anymore.  But maybe
> >>> someone else does?
> >>
> >> That's legal in IR, yes, and we believe it should be usable to
> represent the vectors for
> >> RISC-V's 'V' extension. The main problem there is that they have a
> dynamic vector
> >> length within the loop so that they can perform the last iterations of
> a loop within vector
> >> registers when there's less than a full register worth of data
> remaining. SVE uses
> >> predication (masking) to achieve the same effect.
> >>
> >> For the 'V' extension, vscale would indeed correspond to 'maxvl', and
> I'm hoping that a
> >> 'setvl' intrinsic that provides a predicate will avoid the need for
> modelling a change in
> >> dynamic vector length -- reducing the vector length is effectively
> equivalent to an implied
> >> predicate on all operations. This avoids needing to add a token operand
> to all existing
> >> instructions that work on vector types.
> >
> > Right.  In that way the RISC V method is very much like what the old
> > Cray machines did with the Vector Length register.
> >
> > So in LLVM IR you would have "setvl" return a predicate and then apply
> > that predicate to operations using the current select method?  How does
> > instruction selection map that back onto a simple setvl + unpredicated
> > vector instructions?
> >
> > For conditional code both vector length and masking must be taken into
> > account.  If "setvl" returns a predicate then that predicate would have
> > to be combined in some way with the conditional predicate (typically via
> > an AND operation in an IR that directly supports predicates).  Since
> > LLVM IR doesn't have predicates _per_se_, would it turn into nested
> > selects or something?  Untangling that in instruction selection seems
> > difficult but perhaps I'm missing something.
>
> My idea is for the RISC-V backend to recognize when a setvl intrinsic has
> been used, and replace the use of its value in AND operations with an
> all-true value (with constant folding to remove unnecessary ANDs) then
> replace any masked instructions (generally loads, stores, anything else
> that might generate an exception or modify state that it shouldn't) with
> target-specific nodes that understand the dynamic vlen.
>
> This could be part of lowering, or maybe a separate IR pass, rather than
> ISel.
> I *think* this will work, but if someone can come up with some IR where it
> wouldn't work then please let me know (e.g. global-state-changing
> instructions
> that could move out of blocks where one setvl predicate is used and into
> one
> where another is used).
>
> Unfortunately, I can't find a description of the instructions included in
> the 'V' extension in the online manual (other than setvl or configuring
> registers), so I can't tell if there's something I'm missing.
>

RVV is a little bit behind SVE in the process :-) On the whole it's
following the style of vector processor that has had several
implementations at Berkeley, dating back a decade or more. The set of
operations is pretty much nailed down now, but things such as the exact
instruction encodings are still in flux. There is an intention to get some
experience with compilers and FPGA (at least) implementations of the
proposal before ratifying it as part of the RISC-V standard. So details
could well change during that period.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180608/33701be0/attachment.html>