[llvm-dev] [RFC] Introduce a new stepvector operation

Thu Jan 21 12:02:02 PST 2021

Just to be clear: I think adding another magic named literal would be a mistake. I think being able to write <n, m, ...> would be great, and would obviate the need for many of the named shuffles in Eli's RFC as well as serve in David's use case.

Thanks,
   Christopher Tetreault

-----Original Message-----
From: Cameron McInally <cameron.mcinally at nyu.edu>
Sent: Thursday, January 21, 2021 11:39 AM
To: Chris Tetreault <ctetreau at quicinc.com>
Cc: David Sherwood <David.Sherwood at arm.com>; llvm-dev at lists.llvm.org
Subject: [EXT] Re: [llvm-dev] [RFC] Introduce a new stepvector operation

On Wed, Jan 20, 2021 at 4:33 PM Chris Tetreault via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
> David,
>
>
>
>    Thanks for writing this up. I’d just like to speak to some concerns I have regarding shufflevector. As many of us know, shufflevector takes two vectors and a constant vector of i32, and does stuff. The constant shuffle mask can be scalable or fixed width. The shuffle mask is supposed to be an arbitrary constant vector, however for scalable vectors only zeroinitializer or undef are accepted. There are reasonable technical reasons for this state of affairs, but it reveals an issue: we don't really handle constant scalable vectors very well. Surely there are other similar issues throughout the codebase, but this is one I struggle with regularly so it sticks out in my mind.
>
>
>
>    However, we probably want to be able to use the stepvector in shufflevector. For instance, if we had a stepvector literal, then we could implement vector concatenation in terms of shufflevector:
>
>
>
> %a_concat_b = shufflevector <4 x i16> %a, <4 x i16> %b, <8 x i32>
> stepvector
>
>
>
>    In fact, a lot of useful shuffles can be implemented in terms of stepvector multiplied or added to some constants. Pulling from Eli's list in https://lists.llvm.org/pipermail/llvm-dev/2020-January/138762.html, I can see:
>
>
>
> “
>
>    %result = shufflevector <vscale x 4 x i32> %v1, <vscale x 4 x i32>
> %v2, SHUFFLE_NAME
>
>
>
>    SHUFFLE_NAME can be one of the following (with examples of the equivalent <4 x i32> shuffles):
>
>    concat - Concatenate the two operands (<0, 1, 2, 3, 4, 5, 6, 7>) ->
> see above
>
>    split_low - Return the low half of the first operand (<0, 1>) ->
> stepvector of type <vscale x n/2 x i32>
>
>    split_high - Return the high half of the first operand (<2, 3>) ->
> (stepvector + splat(n/2)) of type <vscale x n/2 x i32>
>
>    zip_low - Zip together the low halves of the two operands (<0, 4,
> 1, 5>)
>
>    zip_high - Zip together the high halves of the two operands (<2, 6,
> 3, 7>)
>
>    unzip_even - Unzip the even elements of the two operands (<0, 2, 4,
> 6>) (stepvector + stepvector) of type <vscale x n x i32>
>
>    unzip_odd - unzip the odd elements of the two operands (<1, 3, 5,
> 7>) (stepvector + stepvector + splat(1)) of type <vscale x n x i32>
>
> “
>
>
>
>    Unfortunately, all of these cannot be done because shufflevector only supports scalable undef or zeroinitializer. In order to support these cases, we would need to extend shufflevector to support stepvector (for concat), and arbitrary constant expressions for the rest. Supporting stepvector might not be so hard with the current scheme: if the shuffle is scalable, and the mask is <0, 1, ..., n - 1>, then the input mask was a scalable stepvector. However, I think this illustrates my proposal: vector pattern literals. They could look like this in IR:
>
>
>
>    <0, 1, ...> ; stepvector
>
>
>
>    This is also more flexible, because it enables lots of other scalable vector literals:
>
>
>
>    <7, 7, ...> ; splat(7) without writing that horrid
> insertelement/shufflevector thing
>
>    <0, 2, ...> ; unzip_even mask
>
>    <1, 3, ...> ; unzip_odd mask
>
>
>
>    The implementation for shufflevector would be straightforward because the mask for the two currently supported cases of zeroinitializer and undef (<0, 0, ...> <=> zeroinitializer and <undef, undef, ...> <=> undef) already follow the proposed scheme. This could also have the side benefits of making some IR easier to read (for very wide vectors, the fixed width stepvector could be more than 80 columns wide), and might result in efficiency gains in the compiler (don't need to walk a very wide vector to see if it is a stepvector; can just canonicalize <0, 1, 2, 3, 4, 5, 6, 7, ..., 2048> once to ConstantPatternVector(0, 1)).
>
>
>
>    Sorry for rambling. I think I’ve personally come around to the idea that a constant would be good. However, a more flexible constant would be best if we’re going to use it to add a bunch of special cases to the codebase. Most special cases for scalable undef and zeroinitializer can be replaced with equivalent code that also handles vector pattern literals.
>
>
>
> Thanks,
>
>    Christopher Tetreault

Glad to hear you've come around, Chris. Named shuffle constants are a good compromise, and I would very much like to see that happen as well.