[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

Fri Feb 7 14:59:47 PST 2020

> On Feb 7, 2020, at 12:39 PM, Eli Friedman <efriedma at quicinc.com> wrote:
>> 
>> Hi Eli,
>> 
>> Did you consider a design point between these two extremes?  You could
>> introduce one new instruction, something like “fixed shuffle vector” that
>> takes two vectors and an enum.  That would keep the structurally (and
>> representationally) different cases as separate instructions, without creating
>> a new instruction per fixed shuffle kind.
> 
> Well, there are sort of two forms of this.  I could add a new instruction, or I could add a new intrinsic (maybe using a metadata string to specify the shuffle).  An instruction is a ton of boilerplate.  And an intrinsic means we don't get shufflevector constant expressions, which are useful for optimization.

Oh yes, I didn’t necessarily mean instruction, an intrinsic would make sense as well.  What value do use see shuffle vector constant expressions provided?

In my opinion, LLVM constant expressions seem like a bad idea in general, and I’d rather see them go away over time - as one example “constants that can trap” often are mishandled, and pattern matching is slower and more complex due to them.  In my mind, the a better representation would be something that directly models what we need: global variable initializers can have relocations applied to them, so they have a very specific grammar of constant expression that represents this (symbol + constant, and symbol-symbol+constant on macho targets).  The rest of the constant expr could be dropped, simplifying the system.

> Either way, it's a bunch of extra work if we intend to eventually unify the two.  I don't see any scenario under which we don't want to eventually unify them.  The operations I'm adding are semantically the same as the equivalent fixed-width shuffles; we just can't represent the shuffle masks the same way.  And I think if we do end up changing the representation of scalable shufflevectors later, we'll be able to autoupgrade the existing ones.

It isn’t obvious to me that these need to be unified: we don’t have to a have a single operation named “shuffle vector” that does all of the possible element permutations.  For example, vperm on PPC Altivec supports data dependent shuffle masks, and merging that into shuffle vector seems like a bad idea.

An alternate design could look like three things (again, ignoring intrinsic vs instruction):

“Data dependent shuffle” would allow runtime shuffle masks.
  -> “fixed mask shuffle” would require a statically determined shuffle mask.
     -> “well known target-independent shuffle” would support specific shuffles like zip, even/odd splits, splat, etc.
         -> “target specific shuffles” would be any number of special things that are shuffle like.

By defining a tower of these, instcombine would canonicalize to the most specific thing possible, and then targets and other transformations (e.g. at isel time) would handle the one representation that maps onto the thing their instruction can do.

The advantage of this approach is that the helper functions (e.g. the methods on the instruction) can be specific to the use case, e.g. “fixed mask shuffle” returns an ArrayRef<int> or whatever, and the “well known target independent shuffle” operation could have a way to turn the enum into an ArrayRef<int> for the case when you want to expand it out.

Again, to be clear, this is true regardless of whether they are intrinsics or instructions. If they are instructions, these would be methods, intrinsics would use global functions that assert on the opcode.

From a historical perspective, the ShuffleVector design was intended to allow significant mid-level optimizations that merged and transformed shuffle masks.  In practice though, instcombine (for example) has had to stay very "hands off" with shuffles in general because it is too easy to produce something that cannot be efficiently code generated.  If you canonicalize towards “well known” shuffles, we could improve this, because we can expect targets to efficiently handle the well known shuffles.

>> Relatedly, how do you foresee canonicalization (in instcombine and inst
>> selection) working for these?  If not for compatibility, it would make sense to
>> canonicalize from shuffle vector to the ‘fixed’ formats, but doing that would
>> probably introduce a bunch of regressions for various targets.
> 
> I'm thinking that we don't use the new named shuffles for fixed-width shuffles at the IR level.  

Is this out of conservatism because of the amount of existing code that works on ShuffleVector?  While variable length vectors will always be special in some ways, it seems useful to make their representation as consistent with static length vectors as possible.

Represent even fixed length shuffles with an explicit representation seems like it would make pattern matching the specific cases easier, makes the IR easier to read, and provides a single canonical representation for each mask.  The generic pattern matching stuff (e.g. PatternMatch.h, DAG ISel) could paper over the representational differences as well.

-Chris