[llvm-dev] [RFC] Extending shufflevector for vscale vectors (SVE etc.)

Thu Feb 6 07:51:45 PST 2020

> On Feb 5, 2020, at 18:01, Chris Lattner via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> On Jan 29, 2020, at 4:48 PM, Eli Friedman via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>> 
>> Currently, for scalable vectors, only splat shuffles are allowed; we're considering allowing more different kinds of shuffles.  The issue is, essentially, that a shuffle mask is a simple list of integers, and that isn't enough to express a scalable operation.  For example, concatenating two fixed-length vectors currently looks like this:
>> 
>> Proposed IR syntax:
>> 
>> %result = shufflevector <vscale x 4 x i32> %v1, <vscale x 4 x i32> %v2, SHUFFLE_NAME
>> 
>> Alternatives:
>> 
>> Instead of extending shufflevector, we could introduce a dedicated intrinsic for each common shuffle.  This is less readable, and makes it harder to leverage existing code that reasons about shuffles.  But it would mean fewer changes to existing code.
> 
> Hi Eli,
> 
> Did you consider a design point between these two extremes?  You could introduce one new instruction, something like “fixed shuffle vector” that takes two vectors and an enum.  That would keep the structurally (and representationally) different cases as separate instructions, without creating a new instruction per fixed shuffle kind.
> 
> Relatedly, how do you foresee canonicalization (in instcombine and inst selection) working for these?  If not for compatibility, it would make sense to canonicalize from shuffle vector to the ‘fixed’ formats, but doing that would probably introduce a bunch of regressions for various targets.
> 
> -Chris

This is a fair point.  It seems to me that the motivation is to make the shuffling of vectors generic enough to support both fixed and scalable vectors, which is a sensible and elegant approach.  However, it may wreak havoc throughout parts of the source base and/or regress several targets until the offending code is identified and properly massaged, which would take considerable work.  Or not.  Though the risk is likely, it would be interesting to get at least a rough assessment of the damage, which can be achieved with a rough prototyping of this approach.

Using intrinsics has its merits too, especially in its simpler form suggested by Chris.  Though it would be less generic and repeating the existing shuffles for fixed length, it would probably be far less risky, particularly for being less invasive.  If anything, it would be easier to prototype with this approach.

If my assessment is near the mark, then, strategically, the latter approach seems to allow making some progress sooner.  Once everyone gets a better understanding of shuffling scalable vectors, then both approaches could be reconsidered again, but informed by this preliminary implementation.

Back to y'all.

__ 
Evandro Menezes ◊ SiFive ◊ Austin, TX