[llvm-dev] [RFC] Introduce a new stepvector operation

Thu Jan 21 14:45:01 PST 2021

> Playing Devil's Advocate here...

> Shuffle vector accepts any mask, right? Would we be in the business of scaling *any* mask the user gives with '...'?

> E.g. a zip_lo would be <0, 4, 1, 5, ...>, right? If that's true, I'm assuming that we could pluck any number of elements from each operand in any order. Is that what you were thinking? Or are there restrictions on what this generic form can accept?

A lot of these more complicated shuffles would require us to be able to use `vscale` in expressions. For example, zip_lo is actually <0, vscale * 4, 1, (vscale * 4) + 1, ...>

> And also consider something like reverse:

> %r = shufflevector <vscale x 4 x float> %1, <vscale x 4 x float> undef, <4 x i32> <i32 3, i32 0, ...>

> That's ambiguous. Does it mean:

> <7, 6, 5, 4, 3, 2, 1, 0>

> Or:

> <3, 2, 1, 0, 7, 6, 5, 4>

For your reverse example, you actually will get <3, 0, -3, -6, ...>. Reverse would be <(vscale * 4) - 1, (vscale * 4) - 2, ...>.

As a starting point, I'm only suggesting we add linear patterns. <n, m, ...> is never ambiguous; in geometric terms, n is the origin of a ray, and m - n is the vector that forms the direction of the ray. I think this would give us a lot of value: it would enable the loop vectorizer to use getStepVector for scalable vectors, it would enable many new shufflevector patterns, it would make the IR easier to read, and it might even make the compiler faster by not having to iterate over masks to see if they are a stepvector mask. (I don't know how often we do this, if at all)

If we allowed vscale and arithmetic operators to be used in vector patterns, it would enable more shuffles. This may be worth doing, but it might overcomplicate the code, and it would require `vscale` to be a constant. If we allowed non-linear patterns, it would enable even more. No programming language that I'm familiar with that has this sort of thing tries to support non-linear patterns.

Personally, I think shufflevector tries to do too much. For the shuffles that are really needed, and can't be represented with a linear pattern mask, we should probably just add intrinsics or instructions for them.

-----Original Message-----
From: Cameron McInally <cameron.mcinally at nyu.edu>
Sent: Thursday, January 21, 2021 2:00 PM
To: Chris Tetreault <ctetreau at quicinc.com>
Cc: David Sherwood <David.Sherwood at arm.com>; llvm-dev at lists.llvm.org
Subject: [EXT] Re: [llvm-dev] [RFC] Introduce a new stepvector operation

On Thu, Jan 21, 2021 at 3:41 PM Chris Tetreault <ctetreau at quicinc.com> wrote:
>
> I think the idea I've proposed is cleaner and more general than just inferring it by the type having vscale. It would be a shame if it were limited to just scalable vectors. I listed a few use cases for fixed width vectors as well. I'm not proposing this as a cure-all for shufflevector, it's just a more general version of the proposed stepvector that is useful in more cases. My personal opinion is that all of these "unary shuffles" should be a different instruction, but that's outside the scope of this RFC.

Agreed, we need to discuss which of these should be unary or binary shuffles. There are vector performance implications from what solution we choose.

E.g. having a unary extract_even|odd(...) will leave us with a 1/2 width vector result, that we'll need to concat with another 1/2 width vector. So I think an even|odd shuffle should be a binary shuffle.

But something like reverse(...), which provides a full vector result, should probably be unary.

That all said, IMO we should keep the shuffles binary and use an undef operator where desired. That gives the user the most flexibility, and doesn't require adding new operations to LLVM.

> As for it being unclear: complicated code is hard to read in general, so complicated pattern vectors will be as well. But to me, "stepvector" only has meaning if I am familiar with the use case and/or read the docs. <0, 1, ...> is obvious to anybody who's taken any sort of higher math class.

Playing Devil's Advocate here...

Shuffle vector accepts any mask, right? Would we be in the business of scaling *any* mask the user gives with '...'?

E.g. a zip_lo would be <0, 4, 1, 5, ...>, right? If that's true, I'm assuming that we could pluck any number of elements from each operand in any order. Is that what you were thinking? Or are there restrictions on what this generic form can accept?

And also consider something like reverse:

%r = shufflevector <vscale x 4 x float> %1, <vscale x 4 x float> undef, <4 x i32> <i32 3, i32 0, ...>

That's ambiguous. Does it mean:

<7, 6, 5, 4, 3, 2, 1, 0>

Or:

<3, 2, 1, 0, 7, 6, 5, 4>

I'd much rather see it as:

%r = shufflevector <vscale x 4 x float> %1, <vscale x 4 x float> undef, <vscale x 4 x i32> reverseinitializer

It's not, of course, the most expressive solution, but it's really just syntactic sugar for things we do today.

>
>
>
> -----Original Message-----
> From: Cameron McInally <cameron.mcinally at nyu.edu>
> Sent: Thursday, January 21, 2021 12:17 PM
> To: Chris Tetreault <ctetreau at quicinc.com>
> Cc: David Sherwood <David.Sherwood at arm.com>; llvm-dev at lists.llvm.org
> Subject: [EXT] Re: [llvm-dev] [RFC] Introduce a new stepvector
> operation
>
> On Thu, Jan 21, 2021 at 3:02 PM Chris Tetreault <ctetreau at quicinc.com> wrote:
> >
> > Just to be clear: I think adding another magic named literal would be a mistake. I think being able to write <n, m, ...> would be great, and would obviate the need for many of the named shuffles in Eli's RFC as well as serve in David's use case.
> >
> > Thanks,
> >    Christopher Tetreault
> >
> > -----Original Message-----
> > From: Cameron McInally <cameron.mcinally at nyu.edu>
> > Sent: Thursday, January 21, 2021 11:39 AM
> > To: Chris Tetreault <ctetreau at quicinc.com>
> > Cc: David Sherwood <David.Sherwood at arm.com>; llvm-dev at lists.llvm.org
> > Subject: [EXT] Re: [llvm-dev] [RFC] Introduce a new stepvector
> > operation
> >
> > On Wed, Jan 20, 2021 at 4:33 PM Chris Tetreault via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> > >
> > > David,
> > >
> > >
> > >
> > >    Thanks for writing this up. I’d just like to speak to some concerns I have regarding shufflevector. As many of us know, shufflevector takes two vectors and a constant vector of i32, and does stuff. The constant shuffle mask can be scalable or fixed width. The shuffle mask is supposed to be an arbitrary constant vector, however for scalable vectors only zeroinitializer or undef are accepted. There are reasonable technical reasons for this state of affairs, but it reveals an issue: we don't really handle constant scalable vectors very well. Surely there are other similar issues throughout the codebase, but this is one I struggle with regularly so it sticks out in my mind.
> > >
> > >
> > >
> > >    However, we probably want to be able to use the stepvector in shufflevector. For instance, if we had a stepvector literal, then we could implement vector concatenation in terms of shufflevector:
> > >
> > >
> > >
> > > %a_concat_b = shufflevector <4 x i16> %a, <4 x i16> %b, <8 x i32>
> > > stepvector
> > >
> > >
> > >
> > >    In fact, a lot of useful shuffles can be implemented in terms of stepvector multiplied or added to some constants. Pulling from Eli's list in https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.llvm.org_pipermail_llvm-2Ddev_2020-2DJanuary_138762.html&d=DwIGaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=O_4M49EtSpZ_-BQYeigzGv0P4__noMcSu2RYEjS1vKs&m=oUBhDwLSPYkEyfN2XuWJkl4sT_RMHdQH_vdinEiQb9E&s=jHVhfm-2eFcDjkeFYlDs9RnjI6Vfx5ZgKADkzhRJQ3I&e= , I can see:
> > >
> > >
> > >
> > > “
> > >
> > >    %result = shufflevector <vscale x 4 x i32> %v1, <vscale x 4 x
> > > i32> %v2, SHUFFLE_NAME
> > >
> > >
> > >
> > >    SHUFFLE_NAME can be one of the following (with examples of the equivalent <4 x i32> shuffles):
> > >
> > >    concat - Concatenate the two operands (<0, 1, 2, 3, 4, 5, 6,
> > > 7>)
> > > -> see above
> > >
> > >    split_low - Return the low half of the first operand (<0, 1>)
> > > -> stepvector of type <vscale x n/2 x i32>
> > >
> > >    split_high - Return the high half of the first operand (<2, 3>)
> > > -> (stepvector + splat(n/2)) of type <vscale x n/2 x i32>
> > >
> > >    zip_low - Zip together the low halves of the two operands (<0,
> > > 4, 1, 5>)
> > >
> > >    zip_high - Zip together the high halves of the two operands
> > > (<2, 6, 3, 7>)
> > >
> > >    unzip_even - Unzip the even elements of the two operands (<0,
> > > 2, 4,
> > > 6>) (stepvector + stepvector) of type <vscale x n x i32>
> > >
> > >    unzip_odd - unzip the odd elements of the two operands (<1, 3,
> > > 5,
> > > 7>) (stepvector + stepvector + splat(1)) of type <vscale x n x
> > > 7>i32>
> > >
> > > “
> > >
> > >
> > >
> > >    Unfortunately, all of these cannot be done because shufflevector only supports scalable undef or zeroinitializer. In order to support these cases, we would need to extend shufflevector to support stepvector (for concat), and arbitrary constant expressions for the rest. Supporting stepvector might not be so hard with the current scheme: if the shuffle is scalable, and the mask is <0, 1, ..., n - 1>, then the input mask was a scalable stepvector. However, I think this illustrates my proposal: vector pattern literals. They could look like this in IR:
> > >
> > >
> > >
> > >    <0, 1, ...> ; stepvector
> > >
> > >
> > >
> > >    This is also more flexible, because it enables lots of other scalable vector literals:
> > >
> > >
> > >
> > >    <7, 7, ...> ; splat(7) without writing that horrid
> > > insertelement/shufflevector thing
> > >
> > >    <0, 2, ...> ; unzip_even mask
> > >
> > >    <1, 3, ...> ; unzip_odd mask
> > >
> > >
> > >
> > >    The implementation for shufflevector would be straightforward because the mask for the two currently supported cases of zeroinitializer and undef (<0, 0, ...> <=> zeroinitializer and <undef, undef, ...> <=> undef) already follow the proposed scheme. This could also have the side benefits of making some IR easier to read (for very wide vectors, the fixed width stepvector could be more than 80 columns wide), and might result in efficiency gains in the compiler (don't need to walk a very wide vector to see if it is a stepvector; can just canonicalize <0, 1, 2, 3, 4, 5, 6, 7, ..., 2048> once to ConstantPatternVector(0, 1)).
> > >
> > >
> > >
> > >    Sorry for rambling. I think I’ve personally come around to the idea that a constant would be good. However, a more flexible constant would be best if we’re going to use it to add a bunch of special cases to the codebase. Most special cases for scalable undef and zeroinitializer can be replaced with equivalent code that also handles vector pattern literals.
> > >
> > >
> > >
> > > Thanks,
> > >
> > >    Christopher Tetreault
> >
> > Glad to hear you've come around, Chris. Named shuffle constants are a good compromise, and I would very much like to see that happen as well.
>
> Oh, I misunderstood. If you want to go that route, why not use "vscale" to imply that the constants are also scaled?
>
> So something like:
>
> <vscale x 2 x i32> <i32 0, i32 2>
>
> would imply:
>
> <i32 0, i32 2, i32 4, i32 6, ...>
>
> I'm not sure how you'd formalize that yet though. And reverse would be a little weird, but I suppose it would be a little weird in your proposal too.
>
> Actually, after writing this out, I think named constants would be much more clear. There's no room for misinterpretation if they're named.